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PATCHED GENES AND THEIR USE 



INTRnnTTrrf^l^ 

Terhniral Pioj^ 

The field of this invention concerns segment polarity genes and their uses 
10 BackpmiHift 

Segment polarity genes were discovered in flies as mutations which change 
the pattern of structures of the body segments. Mutations in the genes cause animals 
to develop the changed patterns on the surfaces of body segments, the changes 
affecting the pattern along the head to tail axis. For example, mutations in the gene 
15 patched cause each body segment to develop without the normal structures in the 
center of each segment. In their stead is a mirror image of the pattern normally 
found m the anterior segment. Thus cells in the center of the segment make the 
wrong structures, and point them in the wrong direction with reference to the over 
all head-to-taU polarity of the animal. About sixteen genes in die class are known 
20 'T'e encoded proteins indude kinases, transcription factors, a 

two secreted proteins called wingless (WG) and hedgehog (HH). a single 
tiansmembiane protein called poiOted (PTC), and some novel proteins not related to 
any known protdn. AU of these proteins are beUeved to work together in signaling 
pathways that inform cells about their neighbors in order to set ceU fetes and 
25 polarities. 
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Many of the segment polarity proteins of Drosophila and other invertebrates 
are closely related to vertebrate proteins, implying that the molecular mechanisms 
involved are ancient. Among the vertebrate proteins related to the fly genes are En- 
1 and -2, which act in vertebrate brain development and WNT-1, which is also 
5 involved in brain development, but was first found as the oncogene implicated in 
many cases of mouse breast cancer. In flies, the patched gene is transcribed into 
RNA in a complex and dynamic pattern in embryos, including fine transverse stripes 
in each body segment primordium. The encoded protein is predicted to contain 
many transmembrane domains. It has no significant similarity to any other known 
10 protein. Other proteins having large numbers of transmembrane domains include a 
variety of membrane receptors, channels through membranes and transporters 
through membranes. 

The hedgehog (HH) protein of flies has been shown to have at least three 
vertebrate relatives: Sonic hedgehog (Shh); Indian hedgehog, and Desert hedgehog. 
15 The Shh is expressed in a group of cdls at the posterior of each developing limb 
bud. This is exactly the same group of cells found to have an important role in 
signaling polarity to the developing limb. The signal appears to be graded, with 
ceUs close to the posterior source of the signal forming posterior digits and other 
limb structures and cells farther from the signal source forming more anterior 
20 structures. It has been known for many years that transplantation of the signaling 
cells, a region of the Umb bud known as the "zone of polarizing activity (ZPA)" has 
dramatic effects on limb patterning. Implanting a second ZPA anterior to the limb 
bud causes a limb to develop with posterior features replacing the anterior ones (in 
essence little fingers instead of thumbs). Shh has been found to be the long sought 
25 ZPA signal. Cultured ceUs making Shh protein (SHH), when implanted into the 
anterior limb bud region, have the same effect as an implanted ZPA. This 
estabUshes that Shh is clearly a critical trigger of posterior limb development. 

The factor in the ZPA has been thought for some time to be related to 
another important developmental signal that polarizes the developing spinal cord. 
30 The notochord, a rod of mesoderm that runs along the dorsal side of early vertebrate 
embryos, is a signal source that polarizes the neural tube along the dorsal-ventral 
axis. The agnal causes the part of the neural tube nearest to the notochord to form 

2 
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floor plate, a moiphologically distinct part of the neural tube. n,e floor plate, in 
turn, sends out signals to the more dorsal parts of the neural tube to further 
determine ceU fates. The ZPA was reported to have the same signaling effect as the 
notochord when transplanted to be adjacent to the neural tube, suggesting the ZPA 
5 makes the same signal as the notochonl. In keeping with this view. Shh was found 
to be produced by notochord ceUs and floor plate ceUs. Tests of extra expression of 
SWi in mice led to the finding of extra expression of floor plate genes in cells which 
would not normally turn them on. Therefore Shh appears to be a component of the 
signal from notochord to floor plate and fK,m floor plate to more dorsal parts of d,e 
10 neural tube. Besides limb and neural tubes, vertebrate hedgehog genes are also 
expressed in many other tissues including, but not limited to the peripheral nervous 
system, brain, lung, liver, kidney, tooth primordia. genitalia, and hindgut and 
foregut endoderm. 

PTC has been proposed as a receptor for HH protein based on genetic 
15 experiments in flies. A model for ti« relationship is that FTC acts tiirough a largely 
unknown patiiway to inactivate both its own transcription and ti.e transcription of the 
mn^few segment polarity gene. This model proposes that HH protein, secreted 
ftom adjacent cells, binds to U« PTC receptor, inactivates it. and thereby prevents 
PTC ftom turning off its own tianscription or that of wingless. A number of 

20 experiments have shown coordinate events between PTC and HH. 

Relevanr I ^It^^irp 

Descriptions of patched, by itself or its role with hedgehog may be found in 
Hooper and Scott. CcU 59. 751-765 (1989); Nakano et al.. Nature. 341. 508-513 
(1989) (boti, of which also describes the sequence for DrosopMla patched) Simcox 
25 et al.. Development 107. 715-722 (1989); Hidalgo and Ingham, Development. 110 
291-301 (1990); Phillips etal.. Development. 110, 105-114(1990); Samped«,and 
Guerrero. Nature 353, 187-190(1991); Ingham etal.. Nature 353. 184-187(1991); 
and Taylor etal., Mechanisms ofDevelopment 42, 89-96 (1993). Discussions of ' 
the role of hedgehog include Riddle et al.. Cdl 75. 1401-1416 (1993); Echelard et 
30 al.. CeU 75. 1417-1430 (1993); Krauss etal.. Cell 75. 1431-1444 (1993); Tabata 
and Kombeig. CeU 76. 89-102 (1994); Heemskerk & DiNardo. CeU 76. 449-160 
(1994); Relink et al.. Cell 76. 761-775 (1994); and a short review articll by 
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Ingham, Current Biology 4, 347-350 (1994). The sequence for the Drosophila 5' 
non-coding region was reported to the GenBank, accession number M28418, 
referred to in Hooper and Scott (1989). supra. See also, Forbes, et al., 
Development 1993 Supplement 115-124. 

5 

CTTMMAPY OF THF. INVF-NTION 
Mediods for isolating patched genes, particularly mammalian patched genes, 
including the mouse and human patched genes, as weU as m\et\etonie patched genes 
and sequences, are provided. The methods include identification of patched genes 
10 from other species, as well as members of the same family of proteins. The subject 
genes provide methods for producing the patched protein, where the genes and 
proteins may be used as probes for research, diagnosis, binding of hedgehog protein 
for its isolation and purification, gene therapy, as weU as other utiUties. 



15 pit IFF nFitrP TPnoN of thp nRAWlNGS 

Fig. 1 is a graph having a restriction m^ of about lOkbp of the 5' region 
upstream from the initiation codon of Dmsophila patched gene and bar graphs of 
constructs of truncated portions of the 5' region joined to p-galactosidase, where tt 
constructs are introduced into fly ceU lines for the production of embryos. The 

20 expression of p-gal in the embryos is indicated in the right-hand table during early 
and late development of the embryo. The greater the number of + 's, die more 
intense tiie staining. 



PF<;rpfPnnN q f thf. sPFTiFir FM^nniMFNTS 
Methods arc provided for identifying members of iht patched (ptc) gene 
family from invertebrate and vertebrate, e.g. mammalian, species, as well as tiie 
entire cDNA sequence of the mouse and human patched gene. Also, sequences for 
invertebrate patched genes are provided. patched gene encodes a 
transmembrane protein having a large number of transmembrane sequences. 

In identifying the mouse and human patdied genes, primers were employed 
to move through the evolutionary tree from the known Drosophila ptc sequence. 
Two primers are employed from the Drosophila sequence with appropriate 

4 
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restriction enzyme linkers to ampUfy portions of genomic DNA of a related 
invertebrate, such as mosquito. The sequences are selected from n^ons which are 
not likely to diverge over evolutionary time and are of low degeneracy. 
ConvenienUy, the regions are the N-terminal proximal sequence, generally within 
5 the first 1 .5kb. usuaUy within the first Ikb. of the coding portion of the cDNA. 
conveniently in the first hydrophiUc loop of the protein. Employing the polymerase 
chain reaction (PCR) with the primers, a band can be obtained from mosquito 
genomic DNA. TTie band may then be amplified and used in turn as a probe. One 
may use this probe to probe a cDNA Ubrary from an organism in a differem branch 
10 of the evolutionary tree, such as a butterfly. By screening the Ubrary and 
identifying sequences which hybridize to the probe, a portion of the butterfly 
patched gene may be obtained. One or more of the resulting clones may then be 
used to rescreen the library to obtain an extended sequence, up to and including the 
entire coding region, as weU as the non-coding 5'- and 3'-sequences. As 
15 appropriate, one may sequence aU or a portion of the resulting cDNA coding 
sequence. 

One may then screen a genomic or cDNA Ubrary of a species higher in the 
evolutionary scale with appropriate probes from one or both of the prior sequences. 
Of particular interest is screening a genomic Ubraiy, of a distantly related 

20 invertebrate, e.g. beefle. where one may use a combination of the sequences 
obtained fiom the previous two species, in this case, the Drosophila and the 
butterfly. By appropriate techniques, one may identify specific clones which bind to 
the probes, which may then be screened for cross hybridization with each of the 
probes individuaUy. The resulting fragments may then be ampUfied, e.g. by 

25 subdoning. 

By having all or parts of the 4 different patduul genes, in the presenUy 
iUustrated example. DmopMa (fly), mosquito, butterfly and beetle, one can now 
compare the;w«cted genes fi)r conserved sequences. CeUs fiom an appropriate 
mammaUan Umb bud or other cells expressing patched, such as notochord. neural 
30 tube, gut, lung buds, or other tissue, particularly fetal tissue, may be employed for 
screening. Alternatively, adult tissue which produces patdied may be employed for 
screening. Based on the consensus sequence avaUable from the 4 other species, one 
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can develop probes where at each site at least 2 of the sequences have the same 
nucleotide and where the site varies that each species has a unique nucleotide, 
inosine may be used, which binds to all 4 nucleotides. 

Ether PGR may be employed using primers or, if desired, a genomic library 
5 from an appropriate source may be probed. With PGR, one may use a cDNA 
library or use reverse transcriptase-PCR (RT-PCR), where mRNA is available from 
the tissue. UsuaUy, where fetal tissue is employed, one wUl employ tissue from the 
first or second trimester, preferably the latter half of the first trimester or the second 
trimester, depending upon the particular host. The age and source of tissue wUl 
10 depend to a significant degree on the abUity to surgically isolate the tissue based on 
its size, the level of expression of patched in the cells of the tissue, the accessibUity 
of the tissue, the number of cells expressing patched and the like. The amount of 
tissue available should be large enough so as to provide for a sufficient amount of 
mRNA to be usefoUy transcribed and ampUfied. With mouse tissue, limb bud of 
15 from about 10 to 15 dpc (days post conception) may be employed. 

In the primers, the complementary binding sequence will usually be at least 
14 nucleotides, preferably at least about 17 nucleotides and usually not more than 
about 30 nucleotides. The primers may also include a restiiction enzyme sequence 
for isolation and cloning. With RT-PCR, the mRNA may be enriched in accordance 
20 witii known virays, reverse transcribed, followed by amplification witii die 

appropriate primers. (Procedures employed for molecular cloning may be found in 
Molecular Cloning: A Laboratory Manual, Sambrook et al., eds.. Cold Spring 
Hariwr Laboratories, Cold Spring Harbor, NY, 1988). Particularly, tiie primers may 
convenienUy come from Ujc N-terminal proximal sequence or other conserved 
25 region, such as tiiose sequences where at least five amino acids are conserved out of 
eight amino adds in three of die four sequences. This is iUustrated by die sequences 
(SEQ ID NO: 11) HTPLDCFWEG, (SEQ ID NO: 12) LIVGG, and (SEQ ID NO: 13) 
PFFWEQY. Resulting PGR products of expected size are subcloned and may be 
sequenced if desired. 

30 The cloned PGR fragment may then be used as a probe to screwi a cDNA 

library of mammalian tissue cdls expressing patched, where hybridizing clones may 
be isolated under appropriate conditions of stringency. Again, die cDNA Ubraiy 
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should come from tissue which expresses patched, which tissue wiU come within the 
limitations previously described. Clones which hybridize may be subcloned and 
rescreened. The hybridizing subclones may then be isolated and sequenced or may 
be further analyzed by employing RNA blots and in situ hybridizations in whole and 
sectioned embryos. Conveniently, a fragment of finom about 0.5 to Ikbp of the N- 
terminal coding rt^ion may be enq>l(^ for the Northern blot. 

The mammalian gene may be sequenced and as described above, conserved 
regions identified and used as primers for investigating other species. The N- 
terminal proximal r^on, the C-tenninal region or an intermediate region may be 
employed for the sequences, where the sequences will be selected having minimum 
degeneracy and the desired level of conservation over the probe sequence. 

The DNA sequence encoding PTC may be cDNA or genomic DNA or 
fragment thereof, particularly complete exons firom the genomic DNA, may be 
isolated as the sequence substantially free of wild-type sequence from the 
chromosome, may be a 50 kbp fragment or smaller fragment, may be joined to 
heterologous or foreign DNA, which may be a single nucleotide, oUgonucleotide of 
up to 50 bp, which may be a restriction site or other identifying DNA for use as a 
primer, probe or the like, or a nucleic add of greater tiian 50 bp, where tiie nucleic 
add may be a portion of a cloning or repression vector, comprise tiie regulatory 
regions of an expression cassette, or tiie like. The DNA may be isolated, purified 
being substantially free of proteins and oUier nucldc acids, be in solution, or tiie 
like. 

The subject gene may be employed for producing all or portions of tfie 
patched protdn. The subject gene or fragment tiiereof, generaUy a fragment of at 
least 12 bp, usually at least 18 bp, may be introduced into an appropriate vector for 
extrachromosomal maintenance or for integration into tiie host. Fragments will 
usually be immediately joined at tiie 5' and/or 3' terminus to a nudeotide or 
sequence not found in tiie natural or wild-type gene, or joined to a label oUier tiian a 
nucleic acid sequence. For expression, an expression cassetie may be employed, 
providing for a ttanscriptional and translational initiation region, which may be 
indudble or constitutive, tiie coding region under tiie transcriptional control of tiie 
transcriptional initiation region, and a transcriptional and translational termination 
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i^ion. Various transcriptional initiation regions may be employed which are 
functional in the expression host. The peptide may be expressed in prokaryotes or 
eukaiyotes in accordance with conventional ways, depending upon the purpose for 
repression. For large production of the proton, a unicellular organism or cells of a 

5 highCT organism, e.g. eukaryotes such as vertebrates, particulariy mammals, may be 
used as the expression host, such as £ coli. B, subtilis. S. cerevisiae, and the like. 
In many situations, it may be desirable to express ^patched gene in a mammalian 
host, whereby the patched gene will be transported to the cellular membrane for 
various studies. The protein has two parts which provide for a total of ax 

10 transmembrane regions, with a total of six extraceUular loops, three for each part. 
The character of the protein has similarity to a transporter protwn. The protein has 
two conserved glycosylation signal triads. 

The subject nucleic acid sequences may be modified for a number of 
purposes, particularly where they will be used intracellularly, for example, by being 

15 joined to a nucleic acid cleaving agent, e.g. a chelated melal ion, such as iron or 
chromium for cleavage of the gene; as an antisense sequence; or the like. 
Modifications may include replacing oxygen of the phosphate esters with sulfur or 
nitrogen, replacing the phosphate with phosphoramide, etc. 

With the availability of the protein in large amounts by employing an 

20 expression host, the protein may be isolated and purified in accordance with 

conventional ways. A lysate may be prepared of the expression host and the lysate 
purified using HPLC, excluaon chromatography, gel electrophoresis, affinity 
chromatography, or other purification technique. The purified protein will generally 
be at least about 80% pure, preferably at least about 90% pure, and may be up to 

25 100% pure. By pure is intended free of other proteins, as well as cellular dd)ris. 

The polypq)tide may be used for the production of antibodies, where short 
fragments provide for antibodies specific for the particular polypeptide, whereas 
larger fiagments or the entire gene allow for the production of antibodies over the 
surface of the polypq)tide or protein, where the protein may be in its natural 

30 conformation. 

Antibodies may be prqnred in accordance with conventional ways, where 
the expressed polypeptide or protein may be used as an immunogen, by itself or 
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obtain further 5' sequence lo ensure that one has at least a functional portion of the 
enhancer. It is found that the enhancer is proximal to the 5' coding region, a 
portion being in the transcribed sequence and downstream from the promoter 
sequences. The transcriptional initiation ,^on may be used for many purposes 
5 studying embryonic development, providing for regulated expression of pmched 
protein or other protein of interest during embryonic development or thereafter, and 
in gene therq)y. 

THe gene may also be used for gene dierapy. by transfection of die normal 
gene into embryonic stem ceUs or into mature cells. A wide variety of viial vectors 
10 can be employed for transfection and stable integration of the gene into the genome 
of the cells. Alternatively, micro-injection may be employed, fusion, or the like for 
introduction of genes into a suitable host cell. See, for example. Dhawan et al 

Science 254. 1509-1512 (1991) and Smith Molecular and CeUular Biology 
(1990) 3268-3271. 

15 ^yP^vidingforthepioductionoflargeamountsofFTCprotein. onecan 

use the protein for identifying ligands which bind to the FTC protein. Particularly 
one may produce the protdn in cells and employ die polysomes in columns for 
isolating ligands for U« PTC protein. One may incorporate d,e FTC protein into 
liposomes by combining the protein with appropriate lipid surfectants. e g 
20 phospholipids. Cholesterol, etc.. and sonicate d,e mixture of the FTC protein and Ae 
surfactants in an aqueous medium. With one or more established ligands. e.g 
hedgehog, one may use the FTC protein to screen for antagonists which inhibit toe 
bindmg of ti,e Ugand. In this way. drugs may be identified which can prevent the 
transduction of signals by the FTC protein in normal or abnormal ceUs. 
25 The FTC protein, particularly binding fragments tiiereof. d.e gene encoding 

the protein, or fragments Uiereof. particularly fiagments of at least about 18 
nucleotides, frequently of at least about 30 nucleotides and up to the entire gene. 
««« particularly sequences associated with ti,e hydrophUic loops, may be employed 
in a wide variety of assays. In diese situations, tiie particular molecules will 
30 nomudly be joined to anodier molecule, serving as a label, where ti,e label can 
direcUy or indirectiy provide a detectable signal. Various labels include 
radioisotopes, fluoiescers. chemiluminescers. enzymes, specific binding molecules 
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particles, e.g. magnetic particles, and the like. Specific binding molecules include 
pairs, such as biotin and strtptavidin, digoxin and antidigoxin etc. For the specific 
binding members, the complementary member would normally be labeled with a 
molecule which provides for detection, in accordance with known procedures. The 
5 assays may be used for detecting die presence of molecules which bind to the 

patched gene or PTC protm, in isolating molecules which bind to ihc patched gene, 
for measuring die amount of patched, either as the protein or the message, for 
identifying molecules which may serve as agonists or antagonists, or the like. 

Various formats may be used in the assays. For example, mammalian or 
10 invertebrate cells may be designed where the cells respond when an agonist binds to 
PTC in die membrane of the cell. An expression cassette may be introduced into 
the cell, where die transcriptional initiation region of patched is joined to a marker 
gene, such as p-galactosidase, for which a substrate forming a blue dye is available. 
A 1.5kb fragment diat responds to PTC signaling has been identified and shown to 
15 regulate expression of a heterologous gene during embryonic development. When 
an agonist binds to die PTC protein, die cell will turn blue. By employing a 
competition between an agonist and a compound of interest, absence of blue color 
formation will indicate die presence of an antagonist. These assays are well known 
in the literature. Instead of cells, one may use the protein in a membrane 
20 environment and determine binding affinities of compounds. The PTC may be 
bound to a surface and a labeled ligand for PTC employed. A number of labels 
have been indicated previously. The candidate compound is added with die labeled 
ligand in an appropriate buffered medium to the surface bound PTC. After an 
incubation to ensure diat binding has occurred, die surface may be washed free of 
25 any non-specifically bound components of die assay medium, particularly any non- 
specifically bound labeled ligand, and any label bound to die surface determined. 
Where die label is an enzyme, substrate producing a detectable product may be used. 
The label may be detected and measured. By using standards, die binding affinity of 
the candidate compound may be determined. 
30 The availability of the gene and die protein allows for investigation of die 

development of die fetus and die role patched and odier molecules play in such 
development. By employing antisense sequences of die patched gene, where die 
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sequences may be introduced in ceUs in culture, or a vector providing for 
tnmscription of the antisense ofthcpau:hed gene introduce into the cells, one can 
investigate the roIe the PTC piotein plays in the ceUular development. By providing 
for the PTC piotdn or fragment theieof in a soluble form which can compete with 
5 the nonnal cellular PTC protein for ligand. one can inhibit the binding of Ugands to 
the ceUular FTC protein to see the effect of v^ation in concentration of ligands for 
the FTC protein on the ceUular development of the host. Antibodies against PTC 
can also be used to block function, since PTC is exposed on the ceU surftce. 
The subject gene may also be used for preparing transgenic laboratory 
10 animals, which may serve to investigate embryonic development and the r»Ie the 
PTC protein plays in such development. By providing for variation in the 
expression of the FTC protein, employing different transcriptional iniUation regions 
whrch may be constitutive or inducible, one can determine the developmental effect 
of the differences in PTC protein levels. Alternatively, one can use the DNA to 
15 knock out d,e PTC protein in embryonic stem cells, so as to produce hosts with only 
a smgle fonctional poicHed gene or where the host lacks a functional patched gene 
By employing homolc^ous recombination, one can introduce a ;«^c/,.^ gene, which 
is differentially regulated, for example, is expressed to the development of the fetus 
but not in the adult. One may also provide for expression of the porcterf gene in 
20 cells or tissues where it is not normally expressed or at abnormal times of 
development. One may provide for mis^xpression or feilure of expression in 
certain tissue to mimic a human disease. Uus, mouse models of spina bifida or 
abnormal motor neuron differentiation in the developing spinal cord are made 
available. ^ addition, by providing expression of PTC pretein in cdls in which it is 

25 °*«n^senotnormallyproduced.onecaninducechangesinceUbehavioru^^ 
binding of ligand to the PTC protein. 

Areas of investigation may include the development of cancer treatments 
The MHngless gene, whose transcription is regulated in flies by FTC, is closely 

ndated to a mammalian oncogene. Wtf-A a key factor in many cases of mouse 
30 breastcancer. Other Wnt famUy members, which are secreted signaling proteins 
are impbcated in many aspects of development. In flies, the signaling fector 
dec^nsaplegic, a member of the TGP^^ta family of signaling proteins, known to 
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affect growth and development in mammals, is also controUed by PTC. Since 
members of both the TGF-beta and Wnt femiUes arc expressed in mice in places 
close to overlapping with patched, the common regulation provides an opportunity 
in treating cancer. Also, for repair and regeneration, proliferation competent ceUs 
5 making FTC protein can find use to promote regeneration and healing for damaged 

tissue, which tissue may be regenerated by transfecting cells of damaged tissue with 
the pre gene and its normal transcription initiation region or a modified transcription 
imtiation region. For example. PTC may be useful to stimuUte growth of new teeth 
by engineering ceUs of the gums or other tissues where PTC protein was during an 
10 earlier developmental stage or is expressed. 

Since Northern blot analysis indicates that/Jfc is present at high levels in 
adult lung tissue, the regulation of ptc expression or binding to its natural Ugand 
may serve to inhibit proliferation of cancerous lung ceUs. The availabUity of the 
gene encoding FTC and the expression of the gene allows for the development of 
15 agonists and antagonists. In addition, PTC is central to the abiUty of neurons to 
differentiate early in development. The avaUability of the gene aUows for the 
introduction of PTC into host diseased tissue, stimulating the fetal program of 
division and/or differentiation. This could be done in conjunction with other genes 
which provide for the Ugands which regulate PTC activity or by providing for 
20 agonists other than the natural Ugand. 

The availability of the coding region for various ptc genes from various 
species, allows for the isolation of the 5' non-coding region comprising the promoter 
and enhancer associated witit the ptc genes, so as to provide transcriptional and post- 
transcriptional regulation of thep/c gene or other genes, which aUow for regulation 
25 of genes in relation to the regulation of theprc gene. Since the/«c gene is 

autoregulated. activation of tiiep/c gene will result in activation of transcription of a 
gene under the transcriptional control of the transcriptional initiation region of the 
ptc gene. The transcriptional initiation region may be obtained from any host 
species and introduced into a heterologous host species, where such initiation region 
30 is functional to the desired degree in the foreign host. For example, a fragment of 
from about 1.5 kb upstream from the initiation codon, up to about lOkb. preferably 
up to about 5 kb may be used to provide for transcriptional initiation regulated by 
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accession no. M28418). 



I. 



pnm« (KR, ,SEQ ro N0:14,: COACSMm^ARGTOCAYCARYTOTOG 
P4R1: (SEQ ID NOrlS) OeACQAAlrCCYTCCCARAARCAyrc (d« ' 
"«*rii«d se.peac« » ECO RI linte) .^puw » 

genomic DNAusta,*ePCR. p™«™ c.„did«,s were asfoUows- 
" 9<'C4iiiiii.;72'CA<MTaii; 

I« -C 30 «c.: 72 -C 90 sec.; 94 -C 15 «c] 3 flmes 

194 -C 15 sec.: 50 -C 30 72 -C 90 KC) 35 ttmes 
72 *C 10 min; 4 "C hold 

Ulis band was sulBloneil into die BmRV site of nlii.,«™i_n 
■>(> .™. *™""l"»I"eKnpl Hand sequenced usine 

20 die USB Sequence Idt. ^ 



25 



30 



^ "tf » Biinrrflv rnwA , .K„.T M niii r m r 

UsTg UK mosqdu, PCR p«hct (SEQ n, N0:7) as a p,obe, a 3 da, 
""tayo-A^ cootf. »g,10 cDNA Bbno. (generously p,o,idcd b, Se«, 
CanoU) was scened. Filten ^ h,bHdl«d « « 'C o^nigh. in a soMon 
— g 5XSSC, ,056 dex^n 5, 

salmon spen. DNA. »,d0.5« SDS. Filto w,si«d in 0. IX SSC, 0.1, SDS 

lOO-OOOplaqncs inidaUy sa«n«.. 2 U and U. w« isohied 

wW,,««ponded>o««N«mi„usofbu«^^ Using U as. p„*e. „,c' 

■.bra^fflte^were^c^nedandSaddiSonal Clones (15, L7.U)we,eisolaBd 
wluch««»npas,.dd,e,«nai«l.rofd,e^c^g^^. -nieMj^g^ 
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sequence of butterfly ptc (SEQ ID N0:3) was determined by ABI automated 
sequoicing. 

ni. ^pt^n nf a Trih tf Uwn fwirt fienomic l ibrary with Mosfiiiitn PCR Product 

5 onn hp Fra piT""* Bntternv Clone 

A Xgeml 1 genomic Ubiary from Dibolium casteneum (gift of Rob DenneU) 
was probed with a mixture of the mosquito PGR (SEQ ID N0:7) product and 
BstXI/EcoRI fragment of L2. Filters were hybridized at 55 'C overnight and 
washed as above. Of the 75,000 plaques screened. 14 clones were identified and the 

10 Sad fragment of T8 (SEQ ID NO:l), which crosshybridired with the mosquito and 
butterfly probes, was subdoned into pBIuescript. 

IV. pf 1^ Mnnc^ r r>MA iTriny T^mente. Primers Drrivcfl fimiTi Region s 

^nn yrvivl in th p Vnnr Insect Hnmnloeues 
15 Two degenerate PCR primers (P4REV: (SEQ ID NO: 16) 

r.nArr,AATTC YTNGANTGYTrYTGGGA; P22: (SEQ ID NO: 17) 
f-^TArrAfirrAAfifrrrGT aGGCCARTGCAT) were designed based on a 
comparison of PTC amino acid sequences from fly (Drosophila melanogaster) (SEQ 
ID N0:6), mosquito (Anopheles gambiae)iS^ ID N0:8), butterfly (Precis 
20 coema)(SEQ ID N0:4), and beetie (Tribolium casteneum)(SEQ ID N0:2). I 
represents inosine, which can form base pairs with all four nucleotides. P22 was 
used to reverse transcribe RNA from 12.5 dpc mouse Umb bud (gift from David 
Kingsley) for 90 min at 37 "C. PCR using P4REV(SEQ ID NO: 17) and P22(SEQ 
ID NO: 18) was then performed on 1 ^1 of the resultant cDNA under the following 

25 conditions: 

94 "C 4 rain.; 72 "C Add Taq; 
(94 'C 15 sec.; 50 "C 30 sec.; 72 "C 90 sec.] 35 times 
72 "C 10 min.; 4 'C hold 
PCR products of the expected size were subcloned into the TA vector (Invitrogen) 
30 and sequenced with the Sequenase Version 2.0 DNA Sequendng Kit (U.S.B.). 
Using the cloned mouse PCR fragment as a probe, 300,000 plaques of a 
mouse 8.5 dpc XgtlO cDNA Ubrary (a gift from Brigid Hogan) were screened at 
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65-C as above and washed in 2x SSC. 0.1% SDS at loom temperatuit. 7 clones 
were isolated, and three (M2 M4. and M8) were subcloned into pBluescripi 11. 
200,000 plaques of this libraiy were rescieened using fint, a 1 . 1 kb EcoRI fragment 
ftom M2 to identify 6 clones (M9-M16) and secondly a mixed pn)be containing the 

most N terminal (Xhol fragment fiom M2) and most C tenninal sequences 
(BamHI/Bgni fragment from M9) to isolate 5 clones (MI7-M21). M9. MIO. M14. 
and M17-21 were subcloned into the EcoRI site of pBluescript n (Strategene). 



V. 

10 Northerns: 

A mouse embryonic Northern blot and an adult multiple tissue Northern blot 
(obtained from Clontech) were probed with a 900 bp EcoRI fragment from an N 
terminal coding region of mousep/c. Hybridization was performed at 65 in 5x 
SSPE, lOx Denhardfs. 100 /.g/ml sonicated salmon sperm DNA. and 1% SDS. 
15 After several short room temperature washes in 2x SSC. 0.05% SDS. the blots were 
washed at high stringency in 0. IX SSC. 0. 1 % SDS at 50C. 
In situ hybridization of sections: 

7.75, 8.5, 11.5. and 13.5 dpc mouse embryos were dissected in PBS and 
frozen in Tissue-Tek medium at -80 "C. 12-16 /.m frozen sections were cut 
20 coUected onto VectaBond (Vector Laboratories) coated slides, and dried for 30-60 
minutes at room tempemture. Afier a 10 minute fixation in 4% paraformaldehyde in 
PBS. the slides were washed 3 times for 3 minutes in PBS, acetylated for 10 minutes 
m 0.25% acetic anhydride in triethanolamine, and washed three more times for 5 
minutes in PBS. Prehybridization (50% formamide. 5X SSC. 250 ^g/ml yeast 
25 tRNA. 500 ^g/ml sonicated salmon sperm DNA. and 5x Denhartfs) was carried 
out for 6 hours at room temperature in 50% fbnnamide/5x SSC humidified 
chambers. The probe, which consisted of I kb from the N-terminus of ;,rc. was 
added at a concentration of 200-1000 ng/ml into the same solution used for 
prehybridization. ami then denatured for five minutes at 80 -C. Approximately 75 
30 Ml of probe were added to each sUde and covered with Parafilm. The sUdes were 
incubated overnight at 65 "C in the same humidified chamber used previously. -Die 
foUowing day, the probe was washed successively in 5X SSC (5 minutes. 65 -Q. 
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0.2X SSC (1 hour, 65 °C), and 0.2X SSC (10 minutes, room temperature). After 
five minutes in buffer Bl (0, IM maldc acid, 0. 15 M NaCl, pH 7.5). the slides were 
blocked for 1 hour at room temperature in 1 % blocking reagent (Boerhinger- 
Mannheim) in buffer Bl, and then incubated for 4 hours in buffer Bl containing the 

5 DIG-AP conjugated antibody (Boerhinger-Mannheim) at a 1:5000 dilution. Excess 
antibody was removed during two 15 minute washes in buffer Bl, followed by five 
minutes in buffer B3 (100 mM Tris, lOOmM NaCl, 5mM MgCl^, pH 9.5). The 
antibody was detected by adding an alkaline phosphatase substrate (350 ^1 75 mg/ml 
X-phosphate in DMF, 450 50 mg/ral NBT in 70% DMF in 100 mis of buffer B3) 

0 and allowing the reaction to proceed over-night in the dark. After a brief rinse in 10 
mM Tris, ImM EDTA, pH 8.0, the slides were mounted with Aquamount (Lemer 
Laboratories). 

VI. nmmphila 5-trgiiscripriona1 initiarinn Tegion p-gal constructs. 

5 A series of constructs were designed that link different regions of the ptc 

promoter from Drosophila to a LacZ reporter gene in order to study the cis 
regulation of the ptc expression pattern. See Fig. 1 . A lO.Skb BamHI/BspMl 
fragment comprising the 5'-non-coding region of the mRNA at its 3*-terminus was 
obtained and truncated by restriction enzyme digestion as shown in Fig. 1. These 

0 expression cassettes were introduced into Drosophila lines using a P-element vector 
(Thummel et al„ Grae 74, 445-456 (1988), which were injected into embryos, 
providing flies which could be grown to produce embryos. (See Spradling and 
Rubin, Science (1982) 218, 341-347 for a description of the procedure.) The vector 
used a pUC8 tiackground into which was introduced the white gene to provide for 

5 yellow eyes, portions of the P-element for integrtion, and the constructs were 
inserted into a polylinker upstream from the LacZ gene. The resulting embryos 
were stained using antibodies to LacZ protein conjugated to HRP and the embryos 
developed witii OPD dye to identify the expression of the LacZ gene. The staining 
pattern is described in Fig. 1, indicating whether there was staining during the early 

10 and late developmat of the embryo. 

Vn. l5:n1flrinn nf a Mouse ptc Gene 
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butterfly PTC respectively. 

25 lit;: »^ -."Cin,. i. ^ 
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.bu«*«.b, „ »d .5 d^. Wbile *e gene is sSU p«se». « .7 dpo^«» 

spleen, skeletal muscle, and testes. 
,0 vnb , 11 liiii mmr n f- ^^"^ ^"'^ " 
no d««»bU s,r»l i. -cSons ftom 7.75 dpc «nbr,os. TOs di«^cy « 

I, .he la. »is or « dpc en^ryos. B, 1. .5 dpc. p,c ^ be d^Ced . *e 
.5 de«.oping.«ngb».sa».g». consist, wi-h Norton, P.^.^ 

„e«,ussys«m.a»wdl«in*«™.li,mu«softt«P«»e»c.Ph^ J.«.sal«. 
i;,Lcrtb«.i..*ecc«^si.«c»«a.'»''' '»- 
JO sde«>«.n»»d..en-y.b™»b<».i.««ve«b«lco.u».p«|Sl««».«^ 
wide ^ of S^ues ftom ».doden™.. mesodenn^ »d ecuxlennai ongm 
supponini its ftndamental rote in embryonic development. 

vin iTfilariim H'f"'" f "°" 
« ■ TO isoUte humane (hp«). 2 x iC p1a,>«s .r^n a h.m». lung cDNA Ubrary 

(HUOJ:., Ctonetech, w«. screened with . Ikbp ' 

de«r», sumte. 5X Denhardfs. 0.2 mg/m. sataon spenn ■'"^^"^ » '» 

SDS) Twoposidv.p.«,»es(H.».dm)we..i»U.ed,tbeinrertscto«d,nto 

30 pmLpt,andupo.»,»encing,b..hcon.ai«dse<,,,e„«.U^ysi".i>ar»«-e 
l.p.cho.n..og.Toiso.a»U»5'««i.anaddi.i.~.6.10'p.a,ueswere 
Z:tadupUc«ewi.hM2.3BC0R.a„dM2-3Xh<.l(-.3ini.g5.un.r,n.Uted 
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sequence of mouse ptc) probes. Ten plaques were purified and of these, 6 inserts 
were subloned into pBluescript. To obtain the full coding sequence. H2 was fully 
and H14. H20, and H21 were partially sequenced. The 5.1kbp of human /,/c 
sequence (SEQ © N0:18) contains an open reading ftante of 1447 amino adds 
• (SEQ ID N0:19) that is 96% identical and 98% similar to mouse/,/c. The 5' and 3" 
untranslated sequences of human /wc (SEQ ID N0:18) arc also highly similar to 
mouse ptc (SEQ ID NO:09) suggesting conserved regulatory sequence. 

CQmparifflnnfMni i y, Human Fiv anri R..n,>,fiY , c ;p, ,„ p n pp.. 
The deduced mouse PTC protein sequence (SEQ ID NO. IO) has about 38% 
identical amino acids to fly PTC over about 1.200 amino acids. TTus amount of 
conservation is dispersed through much of the protein excepting the C-terminal 
region. The mouse protein also has a 50 amino acid insert relative to the fly 
protein. Based on the sequence conservation of PTC and the functional conservation 
15 of hedgehog between fly and mouse, one concludes that;,/c functions simUarly in 
the two organisms. A comparison of the amino acid sequences of mouse (mptc) 
(SEQ ID NO:10), human (hptc) (SEQ ID NO:19). butterfly (bptc)(SEQ ID NO:4) 
and drosophila (ptc) (SEQ ID N0:6) is shown in Table 1. 

TART F 1 

alignment of human, mouse, fly, and butterfly PTC homologs 

alignment of human, mouse, fly, and butterfly ptc homologs 

BPTC fAPr^sEAPsSSpciSJ:::::::^^^^^ 
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S!5!S?;^!SSSvDIEWRLKrLCYSPSIPDFEGYHHIESlIDNVIPC^^ 
*. ♦ * ,*. .* * •• •• • • 

GRKLQSGTAVLLGKPPLR— -WNFDPLEFLEELK 

SSS4"S?JJ5hSklqwthiiipi^vveevk-kl---kfqfpi^ti^ 
*,,* * . * * **..* •••* •• • 

SSSsJSSsFYPFSTSTLNDILGKFSEVSLKNIILGYMniLIV^^^ 

SKSQGAVGIA(WIJ,V7aSVAAGLGI.CSLIGISFHAATTQVLPElAL^ 
SKS0^VGLAGVU.VM.SV»AGLG1«SL1GISFHAATTQVLPFLW 
?5SSi^uilCFSTWU5I^l^AliGIVFNAASTQWPriA^^ 
l5SA<5VGIAGVa.IJ^ITVWVGLGFCALLGIPFNMSTW 

SETGQ»n«IPFEDRTGECIjaTGASVALTSIS»JVTAFETJ^^ 

•SETGOTnCRIPFEORTGECUWTGRSVALTSISNVTAFFMAALIPIPALWVFSLQHRvv^^ 

JJsS^--Seqtkliu«vgfsilfsacstagsffaaafipvpalkv^^ 

^SGD--VpSESGLVUaCSGLSVLIJ^LCNVMAFlJ^LPIP^^^^ 

FNmiVLLIFPMLSMDLyRREDRRU)IFCCFTSPCVSRVIQ>mPOAYTOT^^ 
SlwIviAIFPAll^LYRRBDRRU)irCCFTSPCVSRVIQVEPOAVTEMS^^ 

JUCTRKNDKTHRID-TTRQPLDPDVS 

— — — — * ^ 

ESTSSTRDU£QFSDSSUICLEPPCTKirrMSFAEKHYAPFU.KPKAK>nrVIFLF^ 

°i^:::;~::~^;.--4SS5iAPrimFAVKVTSKL«.iAvii 

wSSSSSi™ivP^NEHKFLnAQTRLFGFySMYAVTQ<Wrey^ 
?l!5^SSSiLTDIVPENTDEHEFLSRQEKYFGF™«AyTQ^^^ 
* *,** .. * *. * . *.**.♦* 
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The identity of ten other clones recovered from the mouse library is not 
determined. These cDNAs cross-hybridize with mouse pre sequence, while differing 
as to their restriction maps. These genes encode a family of proteins related to the 
5 patched protein. Alignment of the human and mouse nucleotide sequences, which 
includes coding and noncoding sequence, reveals 89% identity. 

In accordance with the subject invention, mammalian parcAed genes, including 
the mouse and human genes, aie provided which aUow for high level production of 

10 thepafc/«Ki protein, which can serve many purposes. The porcAerf protein may be 
used in a screening for agonists and antagonists, for isolation of its Ugand, 
particularly hedgehog, more particularly Sonic hedgehog, and for assaying for the 
tnmscription of the mRNA ptc. The protein or fragments thereof may be used to 
produce antibodies specific for the protein or specific epitopes of the protein. In 

15 addition, the gene may be employed for investigating embryonic development, by 
screening fetal tissue, preparing transgenic animals to serve as models, and the Uke. 

All pubUcations and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent appUcation were 
20 specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be readUy 
apparent to those of ordinary skill in the art in light of the teachings of this invention 
25 that certain changes and modifications may be made thereto without departing from 
the spirit or scope of the appended claims. 
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!JJ T^T^V "Sk- Albritton Herbert 

III ™ Kmbarcadero Center, Suite 3400 
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(D) STAtSt OA 
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(F) ZIP, 94111 



(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) 0<»CPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Pafntln Release #1.0, version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95/ 

(B) FILING DATE: 06-OCT-1995 

(C) CLASSIFICATION: 



<viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Rmland, Bertram I 

(B) RE6ISTRATIC»I NUMBER: 20015 

(C) REFERENCE/DOCKET NOMBER: a60190-l 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415-781-1989 

(B) TELEFAX: 415-398-3249 



(2) INFORMATION FOR SEQ ID NOtl: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 736 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
AACNNCNNTN NATGCCACCC CCNCCCAACC TTTNNNCCNN NTAANCAAAA NNCCCCNTTT 
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NATACCCCCT NTAARANTTT TCCACCHNKC NNAAANNCCN CTGNANACNA MGNAAANCCN 120 

TTTTTNAACC CCCCCCACCC GGAATTCCNA NTNNCCNCCC CCAAATTACA ACTCCAGNCC 180 

AAAATTHANA MAATTGGTCC TAACCTAACC NATNGTTGTT ACGGTTTCCC CCCCCAAATA 240 

CATGCACTGG CCCGAACACT TGATOGTTCC CGTTCCAATA AGAATAAATC TCCTCATATT 300 

AAACAAGCCN AAAGCTTTAC AAACTGTTGT ACAATTAATO GGCGAACACG AACTGTTCGA 360 

ATTCTGGTCT GGACATTACA AAGTGCACCA CATCXWATCG AACCAGGAGA AGGCCACAAC 420 

CGTACTGAAC GCCTGGCAGA AGAAGTTCGC ACAGGTTGGT GGTTGGCGCA AGGAGTAGAG 480 

TGAATGGTGG TAATTTTTGG TTGTTCCAGG AGGTGGATCG TCTGACGAAG AGCAAGAAGT 540 

CGTOGAATTA CATCTTCGTG ACGTTCTCCA CCCCCAATTT GAACAACATG TTGAAGGAGG 600 

CGTCCAANAC GGACGTGGTG AAGCTGGGGG TGGTGCTGGG GGTGGCGGCG GTOTACGGGT 660 

GGGTGGCCCA GTCGGGGCTG GCTGCCTTGG GAGTGCTGGT CTTNCCGNGC TKCNATTCGC 720 

CCTATAGTNA GNCGTA ''^^ 
(2) INFORMATION FOR SBQ ID N0s2: 

(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 107 amino acids 

(B) TYPE: amino acid 

(C) STRANOBDNBSSt single 

(D) TOPOUXSY: linear 

(ii) MOLECULE TYPE 5 protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID H0t2l 

Xaa Pro Pro Pro Asn Tyr Asn Ser Xaa Pro Lys Xaa Xaa Xaa Leu Val 
15 10 15 

Leu Thr Pro Xaa Val Val Thr Val Ser Pro Pro Lys Tyr Met His Trp 
20 25 30 

Pro Glu His Leu lie Val Ala Val Pro He Arg He Asn Leu Val He 
35 40 45 

Leu Asn Lys Pro Lys Ala Leu Gin Thr Val Val Gin Leu Met Gly Glu 
50 55 60 

His Glu Leu Phe Glu Phe Trp Ser Gly Hie Tyr Lys Val His His He 
65 70 75 80 

Gly Trp Asn Gin Glu Lys Ala Thr Thr Val Leu Asn Ala Trp Gin Lys 
85 90 95 

Lys Phe Ala Gin Val Gly Gly Trp Arg Lys Glu 
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(2) INFORMATIOM ?OR SSQ ID K0:3i 

(i) SEQTONCK CHARACTERISTICS s 

(A) ISHOTH! S187 base pairs 
(8) TYPBt nucleic add 

(C) STRANDEDHESS: single 

(D) TOPOLOGY, linear 

(il) MOLECULE TtrZt cDNA 



(Xi) SEQUENCE DBSCRIPTIOM, SEQ ID NO»3s 
CCOTCTOTCA CCCCCAOCCC CAGTCCCCOC CCCCCAOCAC CGTCCTCacc AGCCOAOCCK: 60 
CCAOCCCCCC CCOOAOCC«C CGOCGCCOOC CCCAACATGC CCTCCCCTGC TAACOCCGCC 
OCOCCCCTCG GCAOOCACCC CGGCGGCOCG AGGCGCAGAC GCACCGGGCG ACCGCACCGC 
CCCGCCCCGG ACCGGGACTA TCTGCACCGC CCCACCTACT GCGACGCCGC CTTCGCXCTG 
OAGCAGATTT CCAAGGGGAA GGCTACTGGC CGGAAAGCGC CGCTCTGGCT OAGAGCGAAG 
TTTCAGAGAC TCTTATTTAA ACTGGGT«„ TACATTCAAA AGAACTGCGO CAAGTTTTTG 
GTTGTGGGTC TCCTCATATT TGGOGCCTTC GCXGTOGOAT TAAAGGCACC TAATCTCGAG 
ACCAACGTGO AGGAGCTOTG GOTGGAAGTT GGTGGACGAC TGAGTCGAGA ATTAAATTAT 
ACCCGTCAGA AGATAGGAOA AGAGGCTATC TTTAATCCTC AACTCATGAT ACAGACTCCA 
AAAGAAGAAG GCGCTAATGT TCTGACCACA GAGGCTCTCC TGCAACACCT GGACTCAGCA 
CTCCAGGCCA GTCGTGTGCA CGTCTACATG TATAACAGGC AAXGGAAGTT GGAACATTTG 
TCCTACAAAT CAGGGGAACT TATCACGGAG ACAGGTTACA TGGATCAGAT AATAGAATAC 
CTTTACCCTT GCTTAATCAT TACACCTTXO GACTGCTTCT GGGAAGGGGC AAAGCTACAG 
TCCGGGACAG CATACCTCCT AOGTAAGCCT CCTTTACGGT GGACAAACTT TGACCCCTTG 
OAATTCCTAG AAGAGTTAAA GAAAATAAAC TACCAAGTGG ACAGC«^ OGAAATGC^; 
AATAAACCCG AAOT«««XA TGGGTACATG GACCGCCCTT GCCTCAACCC AGCCGACCCA 
CATTGCCCTO CCACAOCCCC TAACAAAAAT TCAACCAAAC CTCTTGA«T GGCCCTTGTT 
TTGAATGGTG GATGTCAAGG TTTATCCAGG AAGTATA«C ATTGGCAGGA GGAGTTGArr 

GTGGOTOGTA COOTCAAGAA TCCCACTOGA 

iT^eaCTCGA AAACTTOTCA GCOCICACCC CCTGCAAACC 

AIGTTCCAGT TAA«,ACTCC CAAGCAAA«, TATGAACACT TCAGGGGCTA CGACTATGTC 
TCXCACATCA ACTGGAA«« AGACAGGGCA GCCGCCATCC «»AGGCCTG GCAGAGGACT 



120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
640 
900 
960 
1020 
1080 
1140 
1200 
1260 
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TACCTGOAGG 
ACAACCACQA 
GCCAGCCGCT 
TCCAAGTCCC 
GCAGGATTGG 
TTGCCGTTTC 
AGTGAAACAG 
CGCACCGGAG 
GCATTGATCC 
TTCAATTTTG 
CGT6AGGACA 
ATTCAAGTTG 
CCCCCATACA 
CAGCTCOGCA 
TCTGAGATCT 
GAGAGCACCA 
CTCGAGCCCC 
TTCCTCCTOA 
CTCAGCCTTT 
CGGGAAACCA 
ATCTATATAG 
CATAAGAGTT 
AT6TGGCTGC 
TGGGAAACTG 
GCTTACAAAC 
ACTAAACAGC 
CTGACCGCTT 
CCTCACCGGC 
ATCCCAGCAG 



TGCTTCATCA 
CCCTGGACOA 
ACCTACTGAT 
AGGCTGCCGT 
GCCTCTCCTC 
TTGCTCTTCG 
GACAGAATAA 
CCAGCGTGGC 
CTATCCCTGC 
CTAT6GTTCT 
GAAGATTOGA 
AGCCACAGGC 
CCAGCCACAG 
CAGAGTATGA 
CTGTACAOCC 
GCTCTACCAG 
CCT6CACCAA 
AACCCAAAGC 
ATGGGACCAC 
GAGAATATGA 
TCACCCAGAA 
TCAGCAATGT 
ACTACTTTAG 
GGAGGATCAT 
TCCTGCTGCA 
GTCTOGTAGA 
GGGTCAGCAA 
CGGAGTGGGT 
CAOAGCCCAT 



AAGTGTCGCC 
CATCCTAAAA 
GCTTGCCTAT 
GGOGCTGGCT 
CTTOATTGGC 
TGTTGGTOTG 
GAGGATTCCA 
CCTCACCTCC 
CCTGCGAGCG 
GCTCATTTTT 
TATTTTCTGC 
CTACACAGAG 
CTTCCCCCAC 
CCCTCACACG 
TGTTACCCTC 
GGACCTGCTC 
GTGGACACTC 
CAAGGTTGTG 
CCGAGTGAGA 
CTTCATACCT 
A6CA6ACTAC 
GAAGTATGTC 
AGACTGGCTT 
GCCAAACAAT 
GACTCGCAGC 
CGCAGATGGC 
CGACCCTGTA 
CCATGACAAA 
COAGTAOGCT 



CCAAACTCCA 
TCCTTCTCTG 
GCCTGTTTAA 
GGCGTCCTGT 
ATTTCTTTTA 
GATGATGTCT 
TTTCAGCACA 
ATCAGCAATG 
TTCTCCCTCC 
CCTGCAATTC 
TGTTTCACAA 
CCTCACAGTA 
GAAACCCATA 
CACGTGTACT 
ACCCAGGACA 
TCCCAGTTCT 
TCTTCGTTTG 
GTAATCCTTC 
GACGGGCTGG 
GCCCAGTTCA 
CCGAATATCC 
ATGCTGGAGO 
CAAGGACTTC 
TATAAAAATG 
OGAGACAAGC 
ATCATTAATC 
GCTTACGCTG 
GCCGACTACA 
CAGTTCCCTT 



CTCAAAAGOT 
ATGTCAGTGT 
CCATGCTGCG 
TGGTTGCCCT 
ATGCTGCGAC 
TCCTCCTGGC 
GGACTGGGGA 
TCACCGCCTT 
AGGCTGCTGT 
TCAGCATGGA 
GCCCCTGTGT 
ACACCCGGTA 
TCACTATCCA 
ACACCACCGC 
ACCTCAGCTG 
CAGACTCCAG 
CAGAGAAGCA 
TTTTCCTGGG 
ACCTCACGOA 
AGTACTTCTC 
AGCACCTACT 
AGAACAAGCA 
AGGATGCATT 
GATCA6ATGA 
CCATCGACAT 
CGAGCGCTTT 
CCTCCCAGGC 
TGCCAGAGAC 
TCTACCTCAA 



CCTTCCCTTC 
CATCCGAGTG 
CTGGGACTGC 
GTCAGTGGCT 
AACTCAGGTT 
CCATGCATTC 
GTGCCTCAAG 
CTTCATGGCC 
GGTGGTGGTA 
TTTATACAGA 
CAGCAGGGTG 
CAGCCCCCCA 
GTCCACCGTT 
CGAGCCACGC 
TCAGAOTCCC 
CCTCCACTGC 
CTATGCTCCT 
CTTGCTGCGG 
CATTGTTCCC 
TTTCTACAAC 
TTACGACCTT 
ACTTCCCCAA 
TGACAGTGAC 
CGGGGTCCTC 
TAGTCAGTTG 
CTACATCTAC 
CAACATCCGG 
CAGGCTGAOA 
OGGCCTACGA 



1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 
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OACACCTCAO 
AOCCTGGGAC 
AGCCTGC6CC 
TGC6CAGTCT 
ATGACOOTTG 
GTGGTCATCC 
GCCTTTCTOA 
TTTGCTCCCG 
TC3CGAATTTC 
G600TTCTCA 
GAGGTGTCTC 
AOTGTCOTCC 
TCGGACTACA 
GCACAGGAGG 
GTCTTTGCCC 
CGGCAACAGC 
OGAAGGGATC 
TTTGAAATTT 
G66GCCCGTT 
AGCTACTGCC 
CCOCOGCCTG 
OCTGAGACTG 
AGGAGGGACT 
TGGGGGAGGA 
AAGCCCCGCC 
GGCAGTTCAT 
AARAGGTQTA 
CCACTCCTGC 
TGTOCCACAA 



ACTTtGTOOA 
TGTCCAOCTA 
ACT0CCT6CT 
TCCTCCTGAA 
A6CTCTTTOO 
T6ATTGGATC 
CAGCCATTOG 
TTCTGGAOGG 
ATTTCATTGT 
ATGGACTGGT 
CAGCCAATGG 
CGTTTOCOGT 
GCTCTCAGAC 
GTGCCGGAGG 
GOTCCACTGT 
CCCACCTGGA 
CCOCTAGAGA 
CTACTGAAGG 
CTCACAACCC 
AGCCCATCAC 
GACCTGGGOG 
ATCAOGGGGT 
CAAAGGTGGA 
GCTCCAACTO 
CCCACCTCTt 
TGTTACTGTA 
CACATOTAAT 
COCAGAGTGG 
CCAAGCTTAA 



AGCCATAGAA 
CCOCAATGCC 
6CTATCCATC 
CCOCTGGACG 
CATGATGGGC 
TOTTCGCATC 
OQAGAAGAAC 
TGCTCTOTCC 
CA6ATACTTC 
TCTGCTGCCT 
CCTAAACOGA 
GCCTCCTOGT 
CAOGGTGTCT 
CCCTGCCSCAC 
GGTCCATCOG 
CTCTOGCTCC 
AGGCTT606G 
GCATTCTGGC 
TOGGAACCCA 
CACTGTGAOG 
CAACCCCGGA 
ATTTOAGOAT 
GGTCATAOAG 
AGGGTAATTA 
TCCAGAACTG 
ACTOATTCTA 
ATACATGGAA 
OGAOAGCACA 
CTTA6TTTTA 



AAAGTGAGAO 
TACCCCTTCC 
AGCGTGGTGC 
GCCGGGATCA 
CTCATTGGGA 
GGAOTGGAGT 
CACAOGGCTA 
ACTCTGCTGC 
TTTGCCGTCC 
GTCCTCTTAT 
CTCCCCACTC 
CACACGAACA 
GGGATCAGTG 
CAAGTGATTG 
GACTCCAGAC 
TTGTCCCCTG 
CCACCCCCCT 
CCTAOCAATA 
ACOTCCACCG 
GCTTCTCCTT 
GGGGGGCCCT 
CCTCATGTCC 
CTACAGGACG 
AAATCTGAAG 
CTTGAAGAOA 
TTATTKKOTG 
ATGCTGTACA 
OGGGCCCTTT 
AAAAAAATCT 



TCATCTGTAA 
TGTTCTGGCA 
TGGCCTGCAC 
TTGTCATCGT 
TCAAGCTGAO 
TCAOCGTCCA 
TGCTOGCTCT 
GTGTACTOAT 
TGOCCATTCT 
CCTTCTTTGG 
CTTCGCCTGA 
ATGGGTCTGA 
AGGAGCTCAG 
TGGAAGCCAC 
ATCAGCCTCC 
GAOGGCAAGG 
ACAGACOGCG 
GGGACCGCTC 
CCATGGGCAG 
CGGTGACTGT 
GTCCAGGCTA 
CTTTTCATGT 
TGGAAT6TCA 
CAAAGAGGCC 
ACTGCTTGGA 
AAATATTTCT 
GTCTATTTCC 
CCCCTGTGTA 
CCCAGCATAT 



CAACTATACG 
GCAATACATC 
GTTTCTAGTG 
CCTGGCTCTC 
TGCTGTGCCT 
CGT6GCTTTG 
GGAACACATG 
GCTTGCAGGG 
CACXiGTCTTG 
ACCGTGTCCT 
CCCGCCTCCA 
TTCCTCCGAC 
GCAATA06AA 
AGAAAACCCT 
CTTOACCCCT 
CCAGCAGCCT 
CAGA6ACGCT 
AGGGCCCCGT 
CTCTCTGCCC 
TGCTGTGCAT 
TGAGAGCTAC 
GAGGTOTGAG 
GGAGAGGCOG 
AAAGATTGGA 
ATTATGGGAA 
ATAAATATTT 
IGGGGCCTCT 
CATTCCTCTC 
GXCGCTGCTG 



3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 
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CTTWU^TATT GTATJATTTA CTTGTATAAT TCTATGCAAA TATTGCTTAT OTAATACGAT 
TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA TATCACAACC CTGTGCrAOG 
ATCAATTGTT ACTGTTAACT TTTGAACACG CTATGCOTGG TAATTGTTTA ACGAGCAGAC 
ATGAAGAAAA CACGTTAATC CCAOTOGCW CTCTAGGGGT AGTTGTATAT GGTTCGCATG 
GGTGGATGTG TGTGTCCATG TGACTTTCCA ATGTACTGTA TT6TGCTTTG TTGTTGTTGT 
TGCTGTTGIT GTTCATTTTG GTGTTTTTOG TTGCTTTGTA TGATCTTACC TCTGGCCTAG 
OTOCCCTGCG AAGGTCCACG TCTTTTTCTO TCXSTGATGCT GGTGGAAA6G TGACCCCAAT 
CATCTGTCCT ATTCTCTGGG ACTATTC 
(2) INFORMATION FOR SBQ ID H0t4: 

(i) SEQUBNCB CHARACTERISTICS J 

(A) LENGTH: 1311 amino acids 

(B) TYPES anino acid 

(C) STRANDSDNSSSt single 

(D) TOPOLOGY: linsar 

(ii) MOLECULE TYPE: protsin 



4800 

4860 

4920 

4980 

5040 

5100 

5160 

5187 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOl4s 

Mat Val Ala Pro Aap Sar Olu Ala Pro Sar Asn Pro Arg Ha Thr Ala 
1 5 10 15 

Ala HiB Glu Sar Pro Cya Ala Thr Glu Ala Arg His Sar Ala Asp Lau 
20 25 



Tvr lie Arg Thr Ser Trp Val Aap Ala Ala Lau Ala Leu Ser Glu Lau 
35 « « 

Glu Lys Gly Asn Ila Glu Gly Gly Arg Thr ser Leu Trp He Arg Ala 
50 55 ^0 

Trp Lau Gin Glu Gin Leu Phe He Leu Gly Cys Phe Leu Gin Gly Asp 



65 



70 75 



Ala Gly Lys Val Leu Phe Val Ala He Leu Val Leu Ser Thr Phe Cys 
85 90 

val Gly Leu Lys Ser Ala Gin He His Thr Arg Val Asp Gin Leu Trp 
100 105 

val Gin Glu Gly Gly Arg Leu Glu Ala Glu Leu Lys Tyr Thr Ala Gin 



lis 



120 125 



Ala Leu Gly Glu Ala Asp Ser Ser Thr His Gin Leu Val He Gin Thr 
130 "5 WO 
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Ala Ly. A.p Pro A.p V.l s.r teu HI. Pro CXy ax. x..u Olu 

iSS 160 

Hi. L» ,.1 V.1 ,1. „. ^, 

^'^^ 175 
A.P U. .Xu I,. »^ ^ ^ ^ 

«.P Ph. «„ Hi. Hi. „. „. „. 

205 

pro q,. „. „. „. ^_ ^ 

220 

J~ L.U Oiy P„ MP „. ^ 

240 

reu cm xrp Thr Hi. x.„ Pro L.u Olu Val VaX CX« GXu VaX Ly. 

255 

*•'• ^ - "u »i. T,r K« Ly. 

270 

«r9 »X. «, U. T., ... Tj. H« .y. .y. ,„ ey. L.„ ,.p p„ 

280 285 
Thr J.p pro Hi. cy. Pro M. r,r AX. Pro A.„ x.y. x.y. ser CXy Hi. 

IXe Pro Aap V.X Ala AXa OXu Leu s-r h*- r-, _ 

305 ^""^ Hi* Cly Cy. Tyr cXy Phe AXa 

320 

«. M. Ty. H« Hi. rr^ p„ ^^^^ 

335 

Arg A.n s.r ,Jr S.r AXa t.„ Arg Lys AX. Arg Xaa CXn Thr VaX 

345 

V.1 «» J.. »« „y ^ „ H.. Ty. Olu ry. rr, M. 

365 

tyr v.1 Hi. „„ u. «y rr, M„ oi. «. .y. „. „. 

380 

JJ« ASP AX. Trp Cl„ jrg Ly. Ph. AX. AXa CXu V.X Arg Lyu IX. Thr 

«r s„ «y H.. V . ^ 3„ ^ 

4X0 

^0° "* »~ v.1 s.r u. .y. 

430 

**" "• '~ ^ "» «.t L.„ .1. ^ v.1 la. v.1 

445 

^ II. Cln Trp Arg A-p Pro IX. Ar, s.r Cl„ AX. OXy V.l cXy ,X. 
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450 «5 

Ala Oly Val Leu L.« Leu Ser He Thr Val Ala Ala Oly Leu Cly Phe 
465 *'0 

cya Ala Leu Leu Gly lie Pro Phe Asn Ala Ser Ser Thr Gin Ue Val 

pro Phe Leu Ala Leu Gly Leu Oly Val Oln Asp Met Phe Leu Leu Thr 
500 505 



His Thr Tyr Val Glu Gin Ala Gly Aap val Pro Arg Glu Glu Arg Thr 
515 520 52b 

Oly Leu val Leu Lye Lye Ser Gly Leu Ser Val Leu Leu Ala Ser Leu 
530 535 540 

eye A-n Val Het Ala Phe Leu Ala Ala Ala Leu Leu Pro He Pro Ala 



550 555 



545 

Phe Arg Val Phe Cys Leu Gin Ala Ala lie Leu Leu Leu Phe A-n Leu 
565 

Gly ser He Leu Leu Val Phe Pro Ala Met He Ser Leu Ajp Leu Arg 

580 

Arg Arg Ser Ala Ala Arg Ala A.p Leu Leu Cys Cys Leu Met Pro Glu 

595 

ser pro Leu Pro Lye Ly. Ly. He Pro Glu Arg Ala Ly. Thr Arg Lys 

610 6" 
Asn A.p Ly. Thr Hi. Arg He Asp Thr Thr Arg Gin Pro Leu Asp Pro 



625 



630 635 



ASP val ser Glu A.n Val Thr Ly. Thr Cys Cy. Leu Ser Val Ser Leu 

650 



645 



Thr Lys Trp Ala Lys A.n Gin Tyr Ala Pro Phe He Met Arg Pro Ala 
660 665 

val Lys val Thr Ser Met Leu Ala Leu He Ala V.l He Leu Thr Ser 
675 680 685 

val Trp Gly Ala Thr Ly. V.l Lys Asp Gly Leu Aep Leu Thr Asp He 



690 



695 



val pro Glu Asn Thr Asp Glu His Glu Phe Leu Ser Arg Oln Glu Lys 
705 710 715 



Tyr Phe Gly Phe Tyr Asn Met Tyr Ala Val Thr Gin Gly Asn Phe Glu 
725 

Tyr pro Thr Asn Gin Ly. Leu Leu Tyr Glu Tyr His Asp Gin Phe Val 



Arg He Pro Asn He He Lys Asn Asp Asn Gly Gly Leu Thr Lys Phe 
755 760 765 
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Trp L^ma Ser L«u Ph. Arg A«p Trp Leu Leu Asp X,.u Oln Val Ala Phe 

775 780 

A«p Lya Olu Val Ala Ser Oly Cya He Thr Oln 61« Tyr Trp cya Lye 

790 

Aan Ala Sar Aap Glu 61y He Leu Ala Tyr Lya Leu Met Val Cln Thr 

«05 810 

Oly Hia Val Aap Aan Pro He Aap Ly. Ser Leu lie Thr Ala oly Hia 
"° 825 830 

Arg Leu Val Aap Ly. Aap Oly lie He Aan Pro Ly. Ala Phe Tyr Aan 



840 845 



Tyr Leu Ser Ala Trp Ala Thr Aan Aap Ala Leu Ala Tyr Oly Ala Ser 

B55 860 

Gin Oly Aan Leu Lya Pro Oln Pro Gin Arg Trp He Hia Ser Pro Glu 
* ®70 875 880 

Aap val Hia Leu Glu He Lya Lya Ser Ser Pro Leu He Tyr Thr Cln 

890 895 

Leu Pro Phe ^r Leu Ser Oly Leu Ser Aap Thr Xaa Ser He Lya Thr 

905 

Leu He Arg Ser Val Arg Aap Leu Cya Leu Lya Tyr Glu Ala Lya Gly 

J25 

Leu Pro Aan Phe Pro Ser Gly He Pro Phe Leu Phe Trp Glu Gin Tyr 
S30 935 

Leu Tyr Leu Arg Thr Ser Leu Leu Leu Ala Leu Ala Cya Ala Leu Ala 

955 960 

Ala val Phe He Ala Val Met Val Leu Leu Leu Aan Ala Trp Ala Ala 
965 970 975 

Val Leu Val Thr Leu Ala Leu Ala Thr Leu Val Leu Gin Leu Leu Gly 

985 990 

val Met Ala Leu Leu Oly Val Ly. Leu ser Ala Met Pro Ala Val Leu 
'9* 1000 1005 

^" ^'he Thr Val Hia Leu Cya 

"1" 1015 1020 

J« ciy V.1 ^ ,x. e,. L,. ^, ^ «, „. s„ 

1035 1040 

Ala Leu Glu Ser Val Leu Ala Pro Val Val Hia Gly Ala Leu Ala Ala 
^"♦S 1050 1055 

Ala L.U Ala Ala Ser Met Leu Ala Ala Ser 01« Cya Gly Phe Val Ala 

1065 1070 

Arg Leu Phe Leu Arg Leu Lau Leu Aap He Val Phe Leu Oly Leu He 
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1075 



1080 1085 



ASP Oly L«u L.« Phe Ph. pro 11. Val L.u Ser lie Leu Gly Pro Ala 
1090 1095 1100 

Ala Glu val Mr, Pro lU Glu Hi. Pro Glu Arg Le« Ser Thr Pro Ser 
1105 1110 

pro Ly. cy. Ser Pro lie Hi. Pro lurg Ly.^Ser Ser Ser Ser Ser^Gly 



1125 



Gly Gly A.p Lye^ser Ser Arg Thr Ser^Ly. Ser Al. Pro Arg^Pro cy. 
Ala pro ser teu Thr Thr He Thr Glu Glu Pro Ser Ser Trp Hi. Ser 



1155 



1160 11«5 



ser Ala Hi. Ser Val Gin Ser Ser Met Gin Ser lie Val Val Gin Pro 

1170 1"5 
Glu val val val Glu Thr Thr Thr Tyr A.n Gly Ser Aep Ser Ala Ser^ 
1185 1190 1195 

Gly Arg Ser Thr Pro^Thr Ly. Ser Ser Hi.^Gly Gly Ala He Thr^Thr 

Thr Ly. val Thr^Al. Thr Al. A.a Ile^Ly. val Glu Val Val^Thr Pro 

ser A.p Arg Ly. Ser Arg Arg Ser Tyr Hi. Tyr Tyr A.p Arg Arg Arg 
^ 1235 "*0 12« 

A.p Arg A.p Glu A.p Arg A.p Arg A.p Arg Glu Arg Asp Arg A.p Arg 
1250 1255 1250 

A.P Arg A.p Arg A.p Arg A.p Arg A.p Arg A.p Arg A.p Arg A.p Arg^ 



1265 



1270 ^275 



Glu Arg ser Arg Clu^Arg Aap Arg Arg A-p^Arg Tyr Arg Aep Clu^Arg 

ASP His Arg Ala Ser Pro Arg Glu Lys Arg Gin Arg Phe Trp Thr 
1300 1305 A^A" 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTHS 4434 base pairs 

(B) TYPE? nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY « linear 

(ii) MOLECULE TYPEl CDNA 



(Xi) SEQUENCE DESCRIPTION! SEQ ID NO: 5 J 
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CO»^C^ GAGC»AOTOA OAOTIUJOCAO AOOGTC«;„ TTCTOTOTTC ACTGTCGCCC 
ACC«C,«:A0 COOCAIUUiC* OIO«CAC»0 AOOCCCCCTO COOVAOAGAO ACTOAOAaxo 
AGAAACAGCG GCCCGOGCTC GCCTAATGAA GTTOTtCCCC TGCCTGCCGT GCCGCATCCA 
CGAGATACAG ATACATCTCT CATG6AC0GC 6ACAGCCTCC CACGCGTTCC GGACACACAC 
GGCGATGTCG TOGATGAGAA AITATTCTCG CATCTTTACA TACGCACCAG CTGOGTGOAC 
GCCCAAGTGO CGCTOGATCA GAIAGATAAC C6CAAAG0GC GTGGCAGCCG CAOGGCCATC 
TATCT«OOAT CAOTATTCCA GTCCCACCTC OAAACCOTCG GCAGCTCCGT GCAAAAGCAC 
GCGCCCAACO TGCTATTCGT GGCTATCCTO GTGCTGAGCA CCTTCTGCGT OGCCCTGAAG 
AGCGCCCAGA TCCACTCCAA GOTGCACCAG CTCTGCATCC AGGAGGCCGG CCGGCTCGAG 
GOGOAACTOO CCTACACACA GAAOAOGATC GGCGAGGAOG AGTCCGCCAC CCATCAGCTG 
CTCATTCAGA CGACCCAOGA CCOGAACCCC TCCGTCCXGC ATCOGCAGGC GCTGCTTGCC 
CACCTOOAGG TCCTGGTCAA GGCCACOGCC G«:AACCTCC ACCTCTAOGA CACCCAATGG 
GCGCTOOGCG ACA«TCCAA CAKCCGAGC AOJCCCTCCT TCOAGGGCAT CTACTACATC 
GAGCAOATCC TGCGCCACCT CATTCOGTGC TOGATCATCA CGCCGCTGGA C«;TTTCTGG 
GAGCGAAGCC AGCTGTTOOG TCOGGAATCA GOGGTCGTTA TACCACGOCT CAACCAAOGA 
CTCCTOTGGA CCACOCT«AA TCCOGCCTCT GTGATGCAGT ATATCAAACA AAAGATGTCC 
GAGGAAAAGA TCAGCTTCGA CTTOGACACC GTGGAGCAGT ACATGAAGCG TGCGGCCATT 
GGCAGTGGCT ACATOOAGAA GCCCTGCCTC AACCCACTGA AICCCAATTC CCC»GACAOG 
GCACCGAACA AGAACAGCAC CCAGCCOCOG GATGTGCOAG CCATCC^TGTC OGGAGGCTGC 
TACGGTTATG CCGOCAAGCA CAI«CACTGG CCGGAGGAGC T^ATTGWGG OGCACGGAAG 
AGGAACCGCA GCGCACACTT OAGOAAGGCC CAGOCCCTGC AGTCGGTGGT GCAGCTGATG 
ACOGAGAAGG AAAXCTAOGA CCAGTGGCAG GACAACTACA AGGTGCACCA TCTTGGAXGG 
ACGCAGOAGA AGGCAGCGCA GGTTTTGAAC CCCXGGCAGC CCAACTTTTC GOGGGAGGW 
GAACAGCTGC TACGTAAACA GTCGACAATT GCCACCAACT ACXSATATCTA CGTGTTCAGC 
TCGGCT«CAC TGGATGACAT CCTGGCCAAG TTCTCCCATC CCACOGCCTT GTCCATTOTC 
ATOGGCGTGG COCTCACCGT TTTGTATGCC TTTTGCACGC TCCTCOGCTC GAGGGACCCC 
GTCCGTGGCC AGAGCAGTGT GGCOGTGGCC CGAGTTCTCC TCATGTCCTT CAGTACCGCC 
GCOGGATTGG GATTCTCAGC CCTGCTCGCt ATCGTTTTCA ATCCGCTGAC CGCTGCCIAT 
GCGGAGAGCA ATCGCCGGCA GCAOACCAAG CTGATTCICA AGAACGCCAG CACCCA6GTG 
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GTTCCOTTTT TOOCCX:TTCO TCTCGOCGtC OATCACATCT TOVTACTCOO ACOGACCKTC 
CTGTTCAOTO CCTOCAOCAC OOCAOOATCC TTCTTTOCGO COCCCTTTAT TCCGGT«;CCG 
GCTTTGAAOG TATTCTGTCT OCAOCCTGCC ATCGTAATGT GCTCCAATTT OGCAGCGGCT 
CTATTCGTTT TTCCGGCCAT 6ATTT0GTTG GATCTAOGGA GACGTACOGC CGGCAGGGCG 
GACATCTTCT GCTOCTOTTT TCCCOTGTGG AAGCAACAOC CGAACCTGGC AOCTCCGGTG 
CTOCCGCTOA ACAACAACAA CCCOCOCGOC CCCOGGCATC CGAAGAGCTG CAACAACAAC 
AGGGTGCCGC TOCCCGCCCA GAATCCTCTO CTGGAACAGA GGGCAGACAT CCCTCCGAOC 
KGTCACTCAC TOGCOTCCTT CTCCCIGOCA ACCTTCGCCT TTCAGCACTA CACTCCCTTC 
CTCATGCGCA GCTGGCTGAA GTTCCTGACC GTTATGGGTT TCCTGGOCGC CCTCATATCC 
AOCTTOTATG CCTCCAOGCG CCTTCAOOAT GGCCTGGACA TTATTGATCT GGTGCCCAAG 
GACAOCAAOG AGCACAAOTT CCTGGATGCT CAAACTCGGC TCTTTGGCTT CTACAOCATG 
TATGCGGTTA CCCACGCCAA CTTTGAATAT CCCACCCAGC AGCAGTTGCT CAGGGACTAC 
CATGATTCCT TTGTGOGGGT GCCACATGTG ATCAAGAATG ATAACGGTGG ACTCCCGGAC 
TTCTOCCTGC TGCTCtTCAO 06AGTGGCTG GOTAATCTGC AAAAGATATT CGACGAGGAA 
TACCGCCACG GACGGCTCAC CAAOCAGTCC TCGTTCCCAA ACGCCAGCAG CGATGCCATC 
CTGGCCTACA AGCTAATCGT GCAAACCGGC CATGTGOACA ACCCCGTOGA CAAGGAACTG 
GTGCTCACCA ATCGCCTGGT CAACAGOGAT GGCATCATCA ACCAACGCGC CTTCTACAAC 
TATCTGTCGG CATGGGCCAC CAACCAOGTC TTCGCCTACG GAGCTTCTCA GGGCAAATTO 
TATCCGGAAC CGCGCCACTA TTTTCACCAA CCCAACGAGT ACGATCTTAA GATACCCAAG 
AGTCTGCCAT TOGTCTACGC TCAOATGOCC TTTTACCTCC ACGCACTAAC AGATACCTCG 
CAGATCAAGA CCCTGATAGG TCATATTCGC GACCTGAGCG TCAAOTACCA CGGCTTCGGC 
CTGCCCAACT ATCCATCGGG CATtCCCTTC ATCTTCTGCG AGCAGTACAT GACCCTGCGC 
TCCTCACTGG CCATGATCCT GGCCTGCGTG CTACtCGCCG CCCTGGTGCT GGTCTCCCTG 
CTCCTGCTCT CCGTTTGGGC CCCCCTTCTC 6TGATCCTCA GCGTTCT6GC CTCGCTGGCC 
CAGATCTTTG GGGCCATGAC TCTGCTC6GC ATCAAACTCT CGGCCATTCC GGCAGTCATA 
CTCATCCICA GOGTGGCCAT GATGCTGT6C TTCAATGTGC TGATATCACT CGCCtTCATG 
ACATCOGTTG GCAACCGACA GOGCCGCCTC CAGCTGAGCA T6CAGATGTC CCTGOGACCA 
CTOXSTCCACG OCATGCTGAC CTCC«GACTG CCCGTGTTCA TGCTCTCCAC GTCGCCCTTT 
GAGTTTCTGA TCCGOCACTT CTGCTGGCTT CTGCT60TGG TCTTAT0C6T TGGC6CCTGC 
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AACAGCCTTT TOOTOTTCCC CJVTCCTACTC AOCATGCTCO GACCGOAOGC CGACCTGCTG 
CCGCTGCAGC ATCCAGACCG CATATCCACG CCCTCTCCGC TGCCCGTCOG CAGCAGCAAG 
AGATCGGCCA AATCCTATCT GGTOCAOGCA TCGCGATCCT CGCGA6GCAG CTGCCAGAAG 
TCGCATCACC ACCACCACAA AOACCTTAAT CATCCATCGC TGACGACGAT CACCGAGGAG 
CCOCA6TCGT GGAAGTCCAG CAACTCCTCC ATCCAGATGC CCAATOATTG GACCTACCAG 
CCCCGCCAAC AG06ACCCCC CTCCTACOCC CCCCCCCCCC CCGCCTATCA CAACOCCCCC 
GCCCAGCAGC ACCACCACCA TCAOOCCCCO CCCACAACCC CCCCCCCTCC CTTCCCGACG 
GCCTATCCGC CGGAGCTGCA OAGCATCGTG GTGCAGCCGG AGCTCACCGT GGAGACGACG 
CACTOGGACA GCAACACCAC CAAGGTGACG GCCACGGCCA ACATCAAGGT GGAGCTCGCC 
ATCCCCCGCA GG6CGOTGC6 CA6CTATAAC TTTACGAGTT AGCACTAOCA CTAGTTCCTC 
TAGCTATTAG GACOTATCTT TAGACTCTAG CCTAAOCCGT AACCCTATTT GTATCTGTAA 
AATCGATTTG TCCAOCOGGT CTCCTGAGCA TTTCGTTCTC ATGGATTCTC ATGGATTCTC 
ATGGATGCTT AAATOOCATC OTAATTGCCA AAATATCAAT TTTTCTOTCT CAAAAAGATG 
CATTACCTTA TGGTTTCAAG ATACATTTTT AAAGAGTCOG CCAOATATTT ATATAAAAAA 
AATCCAAAAT OCACCTATCC ATGAAAATTG AAAAGCTAAG CAGACCCGTA TGTATQTATA 
TOTGTATGCA TGTTAGTTAA TTTCCOGAAG TCCGGTATTT ATAGCAGCTG CCTT 
(2) INFORMATION FOR SKQ ID NOt6{ 

(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 128S amino acids 

(B) TTPBt amino acid 

(C) STRANDEDNBSSx single 

(D) TOPOLOGY: linaar 

(ii) MOLECULE TYPE: protain 
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(Xi) SEQUENCE DESCRIPTION x SEQ ID NOi6: 

Met Aap Arg Asp Ser Lau Pro Arg Val Pro Asp Thr Hia Gly Aap Val 
5 ID 15 

val Asp Glu Lys Leu Pha Ser Asp Leu Tyr He Arg Thr Ser Trp Val 



25 30 
Asp Ala Gin Val Ala Leu Aap Gin He Aep Lys Gly Lye Ala Arg 



40 



45 



Gly 



ser Arg Thr Ala He Tyr Leu Arg Ser Val Phe Gin ser His Leu Glu 



55 



60 
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Thr Leu Oly Ser 8er V.l Oln Ly. Hi. Ala Gly Lys Val L«u Phe Val 

65 '0 

Ala XI. Leu val Leu Ser Thr Phe Cya Val Gly Leu Lya ser Ala Gin 
85 90 95 

lie Hi. ser Ly. V.l Hi. Oln Leu Trp He Gin Glu Gly Gly Arg Leu 
100 "5 

olu Ala Olu Leu Ala Tyr Thr Gin Ly. Thr He Gly Glu A.p Glu Ser 
lis 



Al 



a Thr Hi. Gin Leu Leu He Gin Thr Thr Hi. Asp Pro A.n Ala Ser 



135 1*0 



130 

val Leu Hi. Pro Gin Al. Leu Leu Ala His Leu Glu Val Leu Val Lys 
145 150 155 

Ala Thr Ala Val Ly. Val Hi. Leu Tyr Asp Thr Glu Trp Gly Leu Arg 
165 I'^O 



ASP Met Cys Aan Met Pro Ser Thr Pro Ser Phe Glu Gly II. Tyr Tyr 
180 185 

He Glu Oln lie Leu Arg His Leu He Pro Cys Ser lie He Thr Pro 



195 



200 205 



Leu Asp cy. Phe Trp Glu Gly Ser Gin Leu Leu Gly Pro Glu Ser Ala 
210 



215 220 



val val lie Pro Gly Leu A.n Gin Arg Leu Leu Trp Thr Thr Leu Asn 
225 230 235 240 

Pro Ala ser Val Met Oln Tyr Met Ly. Gin Ly. M.t Ser Glu Glu Ly. 

245 250 2" 

lie ser Ph. A.p Phe Glu Thr Val Glu Gin Tyr Met Lys Arg Ala Ala 
260 265 2™ 

lie Gly ser Gly Tyr Met Glu Ly. Pro Cy. Leu A.n Pro Leu A.n Pro 



275 



280 



A.n cys pro A.p Thr Ala Pro Aan Lys Asn Ser Thr Oln Pro Pro A.p 
290 295 300 

val Gly Al. He Leu Ser Gly Oly Cy. Tyr Oly Tyr Ala Ala Ly. Hi. 

305 310 315 

His Trp pro Glu Glu Leu He Val Oly Gly Arg Ly. Arg Aen Arg 

330 335 



Met 



325 



ser Gly Hi. Leu Arg Ly. Ala Gin Ala Leu Gin Ser Val Val Gin Leu 
340 345 

Met Thr Glu Ly. Glu Met Tyr A.p Gin Trp Gin Asp Aan Tyr Ly. Val 
3g5 360 365 

Hi. Hi. Leu Oly Trp Thr Oln Glu Ly. Ala Ala Glu Val Leu A.n Ala 
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III L.U Leu Arg Ly. cin 



395 



400 



S.r Arg He Al. Thr Asn Tyr Asp He Tyr Val Phe Ser Ser Al. Al. 



410 



415 



I-u A.P A.P IX. r.eu AX. rye Ph. Ser „u p,o ser AX. teu Ser IX. 

430 

V.1 ne Oiy V.X AX. V.l ,hr VaX L.« Tyr AX. Ph. Cy. Thr Leu I.u 
Arg Arg A.p Pro V.X Arg Oly CX„ s.r Ser V.X OXy V.X AX. cXy 



V.X Leu Leu Met Cy. Phe Ser Thr AX. AX. GXy Leu GXy 



475 



I^u Ser Ala 
480 



I-u Leu CXy IX. V.X Ph. A.„ AX. L.u Thr AX. AX. Tyr AX. oXu s.r 

495 

Asn Arg Arg OXu OXn Thr Ly. Leu iXe Leu Lye Aen aX. s.r Thr CX„ 

V.X V.X Pro Ph. L«. AX. L.u CXy L.U CXy V.X A.p Hie IX. Ph. IX. 

525 

..1 OXy ,„ ,„ XI. ^ ^ 

s« 540 
Phe AX. AX. AX. Phe IXe Pro Val Pro ai- t 

545 " V.X Phe Cy. Leu 

555 

GXn AX. AX. IX. V.1 Met Cy. S.r A.„ L.u AX. AX. AX. ^ V.X 

^'^^ 575 

«<. Pr» «. J« «. .„ t» MP J.. ^ ^ ,^ 

590 

M. ASP II. Ph. cys cy. Cy. Ph. P,. v.X Trp Lys GXu CXn Pro Ly. 

*™ 605 

Val Ala Pro Pro Val Lou Pro r*» » 

610 lit AX. 

«5 gjO 

Arg Hi. Pro Ly. s.r a.„ Asn Asn Arg v.l Pro L.u Pro AX. cXn 

635 g4o 
As„ Pro L.U L.U Olu ci„ Arg AX. Aap IX. Pro CXy S.r S.r Hi. s.r 

655 

«U s.r JJ. ..r „. , ,^ ^ 

670 

Phe L«u Met Arg Ser Tro Val rv« ^k-. * 

675 ^ ^'^^ ««t Cly Phe Leti 



685 
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«. «. «. S~ «« »• 

690 

„„ «p 11. II. MP "» ?S "° "J 

705 "» 



«p «. OU Thr «, L«. »iy PJ. ^ ^ 

725 

. clu Tvr pro Thr oln Gin Gin Leu Leu Arg Asp 

Thr Gin Oly Aen Phe Glu Tyr Fro *n 

740 

„C HI. MP «- «" !S "° 

755 

» . 1.™ Phe xrp tmu Leu Leu Phe Ser Clu Trp Leu Gly Asn 
Gly Leu Pro Asp Pne :irp i-«u 

770 

.in L,. n. F" MP "« ?S °" 'O" 

Z C. trp PM pro ». U. s.r =.r MP «• - 

805 

^„ .1. ... .in Tn. .1, Hi. V.1 MP Mn pro ..1 MP ^ 

820 

1 c*»*» lian Glv lie lie Asn Gin Arg 

Val Leu Thr Aan Arg Leu Val Asn Ser Asp Oly 

835 

«. P,» T,r Mn Trr L» ..r Trp «. «.r Mn Mp V.l Ph. «. 

850 

Cly Al. ser Gin Gly Lye Leu Tyr Pro Glu Pro Arg Gin Tyr Phe 

865 

.in pro Mn .!« Tyr »«. -y "J =~ ^ 

885 

v.1 T,r Al. .1. «.t p" p" ^' - "« ^s"" 

900 

.in 11. L,. T« XI. 01, Hi. n. «, MP «u S.r ».! L,. Tyr 

915 

.in .1, PM .1, p- M» "° »" 

930 

Trp .m .in tV« ».t - "* »S 

945 

C. ,.1 «u t.n M. «. «J '^s 

V.1 rrp ». «. v.. V.1 XI. - V.1 «. .« - 

980 

om lie Phe Gly Al. Met Thr Leu Leu Gly Ue Ly. Leu S.r Ale lie 
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995 1000 



1005 



pro Ala val II. L.u He teu Ser Val Oiy M.t Met Leu Cye Phe Asn 

1015 1020 
Val^Leu lie Ser Leu oly^Ph. Met Thr Ser Valoiy A.n Ar, Gin Arg 



1035 



1040 



Arg val Oln Leu Ser Met Gin Met Ser Leu Oly Pro Leu Val HI. Gly 

10« 1050 1055 

Met Leu Thr Ser Gly Val Ale Val Phe Met Leu s.r Thr Ser Pro Phe 

1065 1070 

Clu Phe val lie Arg Hi. Phe Cy. Trp Leu Leu Leu Val Val Leu Cy. 
^"75 1080 1085 

val Gly Ala Cy. A.n Ser Iju Leu Val Phe Pro He Leu Leu s.r Met 

1095 1100 

Val^Gly Pro Clu Ala Glu L.u Val Pro Leu Glu Hi. Pro Aep Arg He 

^^15 1120 

ser Thr Pro Ser Pro Leu Pro Val Arg s.r Ser Ly. Arg Ser Gly Ly. 

"25 1,30 ^ y 

ser Tyr Val Val^cin Gly Ser Arg ser^ser Arg Gly ser ^.^cin Ly. 

ser Hi. Hi. Hi. Hi. Hi. Ly. A.p Leu A.n Aep Pro S.r Leu Thr Thr 

1160 1165 

II. Thr Glu Glu Pro Gin Ser Trp Ly. Ser Ser Aan ser Ser He Gin 

1175 1180 

Het^Pro Aan A.p Trp Thr ,yr Gin Pro Arg Glu Gin Arg Pro Ala Ser 

^^^5 1200 

Tyr Ala Ala Pro Pro Pro Ai. tyr Hi. Ly. Ala Al. Ala Gin Gin Hi. 
"OS 1210 

Hi. Gin Hi. Gln^Gly Pro Pro Thr Thr Pro Pro Pro Pro Phe Pro Thr 

1230 



1220 1225 



Ala Tyr Pro^Pro Glu Leu Gin Servile Val Val Gin Pro Glu Val Thr 

val Olu Thr Thr Hi. Ser A.p Ser A.n Thr Thr Ly. Val Thr Ala Thr 

^55 1260 

Ala^A.„ H. Ly. Val Glu u.a Ala Mat Pro Gly Arg Al. Val Arg s.r 

i275 1280 

Tyr Aan Phe Thr Ssr 
1285 

(2) INFORMATION POR SBQ 10 N0r7t 
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(i) SEQOENCS CHARACTERISTICS: 

(A) tEHOTHt 345 bas« pairs 

(B) TYPBi nucl«ie aeld 

(C) STRANDEDNESS: •Litgl* 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DHA (genomic) 



(xi) SEQOBNCB DBSCRIPTIOMj SEQ ID H0j7: 

AACCTCCATC AGCTTTOCAT ACAGGAAGGT G0TTC6CTCC AGCAT0A6CT A6CCTACACG 60 

CAGAAATCGC TCGGCGAGAT OGACTCCTCC ACeCACCAGC TGCTAATCCA AACNCCCAAA 120 

GATATGGACO CCTCGATACT GCACCOGAAC GCGCTACT6A CGCACCTGGA COTOGTOAAG 180 

AAAGCOATCT CGGTGACG6T GCACATGTAC OACATCACGT GGAGNCTCAA GGACATOTGC 240 

TACTCGCCCA OCATACCGAG NTTCOATACG CACTTTATCO AOCAGATCTT CGAOAACATC 300 

ATACCOTGOG 06ATCATCAC GCCGCTGOAT TGCTTTTGGG AGOGA 345 
(2) IMFORKATIOH FOR SBQ 10 HO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTIffll: SEQ ID N0:8: 

LVB Val His Gin Leu Trp He Gin Glu Gly Oly Ser Leu Glu His Glu 
1 S 10 » 

Leu Ala Tyr Thr Gin Lys Ser Leu Gly Glu Met Asp Ser Ser Thr Bis 
20 25 30 

Gin Leu Leu He Gin Thr Pro Lye Aep Met Asp Ala Ser He Leu His 
35 40 45 

Pro Asn Ala Leu Leu Thr His Leu Asp Val Val Lys Lys Ala He Ser 
50 5S 60 

Val Thr val His Met Tyr Asp He Thr Trp Xaa Leu Lys Asp Met Cys 
65 70 75 80 

Tvr ser Pro Ser He Pro Xaa Phe Asp Thr His Phe He Glu Gin He 
^ 85 90 95 



42 



wo 96/1 1260 



PCT/USy5/13233 



Ph« Glu Asn lie li* Pro Cya Ala 
100 

Trp Glu Gly 
115 

(2) INFORMATION FOR SBQ ID N0t9: 

(i) SEQUBNCX CHARACTERISTICS I 

(A) LBNOTHz 5187 hw pairs 

(B) TYPES nuol«ie acid 

(C) STRANDEDNESSi Single 

(D) TOPOLOGY: linsar 
(il) MOLECULE TYPE: cDNA 



lie I la Thr Pro Leu Asp Cys Phe 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9; 
GGGTCTGTCA CCC6GAGCCC GAGTCCCOCG COGCCAGCAG CGTCCTCGOG AGCCGAGCGC 
CCAGOOGOGC CCCGACCCCG CGGCGGCGGC GGCAACATGG CCTCGGCTGG TAACCCCGCC 
CGGGCCCTGC GCACCCJWKJC CGCCCGOOGO AGGOCCAGAC GGACCGGGOO ACCGCACOGC 
GCCGCGCCCG ACCGCGACTA TCT0CACC6G CCCACCTACT GCGACGCCGC CTT06CTCTG 
GAGCAGATTT CCAAGCCGAA 06CTACT0GC CCGAAAGOGC OOCTGTGGCT CAGAGCGAAG 
TTTCACAGAC TCTTATTTAA ACTOGGTTGT TACATTCAAA AGAACTGOCC CAAGTTTTTG 
OTTCTGCCTC TCCTCATATT TGGGCCCTTC GCTGTGGGAT TAAACGCAGC TAATCTCGAO 
ACCAACGTGG ACCAGCTOTO 00TC6AAGTT CCTGGAOCAG TGAGTCCACA ATTAAATTAT 
ACCCGTCAGA AGATA06AGA AGA6GCTATG TTTAATCCTC AACTCATGAT ACAGACTCCA 
AAAGAACAAG GOCCTAATOT TCTGACCACA CAGCCTCTCC TGCAACACCT GGACTCAGCA 
CTCCAGGCCA OTCOTOTGCA CCTCTACATG TATAACACCC AATCGAAGTT GGAACATTTG 
TGCTACAAAT CAGOOOAACT TATCACGGAG ACAGGTTACA TGGATCAGAT AATAGAATAC 
CTTTACCCTT CCTTAATCAT TACACCTTTG GACTGCTTCT GGGAAGOGCC AAAGCTACAG 
TCOOGGACAG CATACCTCCT AGGTAAOCCT CCTTTAOCGT GGACAAACTT T6ACCCCTTG 
GAATTCCTA6 AAGAGTTAAA OAAAATAAAC TACCAAGTGG ACA6CTGCGA GCAAATGCTG 
AATAAA0C06 AAGTTCOCCA TCGGTACATG GACCGGCCTT 6CCTCAACCC A6CCGACCCA 
GATTCCCCTG CCACAGCCCC TAACAAAAAT TCAACCAAAC CTCTTGATGT CGCCCTTGTT 
TTCAATGOTG OATGTCAACG TTTATCCAGC AAGTATATGC ATTGGCACCA OGACTTCATT 
GTGGGTGOTA CCOTCAACAA TGCCACTOCA AAACTTGTCA GOCCTCACGC CCTGCAAACC 
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ATGTTCCMT TAATGACTCC CAAGCAAATO TATGAACACT TCAGGGGCTA CGACTATGTC 
TCTCACATCA ACTGGAATOA AGACAGCGCA GCCGCCATCC TGGAGGCCTG GCAGAGCACT 
TACGTGGAOG TGGTTCATCA AAGTGTCGCC CCAAACTCCA CTCAAAAGGT GCTTCCCTTC 
ACAACCACGA CCCTGGACGA CATCCTAAAA TCCTTCTCTG ATGTCAGTGX CATCCGAGTG 
GCCACCGGCT ACCTACTGAT CCTTGCCTAT GCCtGTTTAA CCATGCTGCG CTGGGACTGC 
TCCAAGTCCC AGGGTGCCGT CGGGCTGCCT GGCOTCCTGT TCGTTGCGCT GTCAGTGOCT 
GCAOGAMGG GCCTCTGCTC CTTGATTGGC ATTTCTTTTA ATGCTGCGAC AACTCAGGTT 
TTGCCGTTTC TTGCTCTTGG TGTTGGTGTG GATCATGTCT TCCTCCTCGC CCATGCATTC 
AGTOAAACAG GACAGAATAA CAGGATTCCA TTTGAGGACA GGACTGGGGA GTGCCTCAAG 
CGCACOGGAG CCACCGTGGC CCTCACCTCC ATCAGCAATG TCACCGCCTT CTTCATGGCC 
GCATTGATCC CTATCCCTGC CCT0CCA6CG TTCTCCCTCC AGGCTGCTGT CGTGGTGGTA 
TTCAATTTTG CTATCGTTCT GCTCATTTTT CCTGCAATTC TCAGCATGGA TTTATACAGA 
CXSTGAGOACA GAAGATTGGA TATTTTCTCC TGTTTCACAA GCCCCTGTGT CAGCAGGGTG 
ATTCAAGTTG AGCCACAGGC CTACACAGAG CCTCACAGTA ACACCC6GTA CAGCCCCCCA 
CCCCCATACA CCAGCCACAO CTTCGCCCAC GAAACCCATA TCACTATGCA GTCCACCGTT 
CAGCTCCGCA CACAGTATGA CCCTCACACG CACGTGTACT ACACCACCGC CGAGCCACGC 
TCTGAGATCT CTGTACACCC TGTTACOGTC ACCCAGGACA ACCTCAGCTG TCAGAGTCCC 
GAGAGCACCA GCTCTACCAG GGACCT6CTC TCCCAGTTCT CAGACTCCAG CCTCCACTGC 
CTCGAGCCCC CCTGCACCAA 0T6GACACTC TCTTCGTTTG CAGAGAAGCA CTATGCTCCT 
TTCCTCCTGA AACCCAAAGC CAAGGTTGTG GTAATCCTTC TTTTCCTGGG CTTGCTGGGG 
GTCACCCTTT ATGGGACCAC CCGAGTCAGA GACGGGCTGG ACCTCACGGA CATTGTTCCC 
CGGGAAACCA GAGAATATGA CTTCATAGCT GCCCAGTTCA AGTACTTCTC TTTCTACAAC 
ATGTATATAG TCACCCA6AA AGCACACTAC CCGAATATCC AGCACCTACT TTACGACCTT 
CAXAAGACTT TCACCAATGT GAAGTATGTC ATGCTGGAGG AGAACAAGCA ACTTCCCCAA 
ATGTGGCTOC ACTACTTTAG AGACTGGCTT CAAGGACTTC AGCATGCATT TGACAGTGAC 
TGGGAAACTG GGAGGATCAT 6CCAAACAAT TATAAAAATG GATCAGATOA CGGGGTCCTC 
GCTTACAAAC TCCTGGTGCA OACTGGCAGC CGAGACAACC CCATCGACAT TAGTCAGTTG 
ACTAAACAGC GTCTGGTACA CGCAGATGGC ATCATTAATC CGAGCGCTTT CTACATCTAC 
CTGACOGCTT GCGTCAGCAA CGACCCTGTA GCTTACGCTG CCTCCCAGGC CAACATCOGG 
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OCTCACCCCC OGGMTGGGT CCaXOACAAA GCCOACTACA TOCCACAGAC CAGGCTOACA 2940 

ATCCCA6CAG CAGAOCCCAT CCAGTAC6CT CAGTTCCCTT TCTACCTCAA COCCCTACCA 3000 

GACACCTCAO ACTTTGTCOA AGCCATAGAA AAAGTGA6AG TCATCTCTAA CAACTAIAC6 3060 

AGCCTOOGAC TGTCCAOCTA CCCCAATGGC TACCCCTTCC TOTTCTGOOA OCAATACATC 3120 

AOCKTOOOCC ACTOCCTGCT OCTATCCATC A0C6TOOT6C TGCCCTOCAC OITTCTA6TG 3180 

MCGCAOTCT TCCTCCTOAA CCCCTOOACO OCOOOCATCA ITGTCATGOT CCT6CCTCTG 3240 

ATOACCOTTO AOCTCTTTGG CATGAWOOC CTCATTCCGA TOIAGCTOAO TGCTGT6CCT 3300 

GTOOTCATCC TGATTOCATC TGTTOOCATC 6GACTGGACT TCACCOTCCA CGTCCCTTTG 3360 

GCCTTTCTCA CAOCXaTTGG GOACaAGAAC CAC»GGGCTA TGCTC6CTCT GGAACACATG 3420 

TTTGCTCCCG TTCTGCAOGO T0CT6TGTCC ACTCTCCTGO OTGIACTCAT CCTTOCAGGC 3480 

TCCGAATTTG ATTTCATTOT CAOATACTTC TTTGCtJGTCC TOOCCATTCT CACCGTCTTO 3640 

GCGGTTCTCA ATGCACTGGT TCTOCTGCCT OTCCTCTTAT CCTTCTTTGC ACCGTOTCCT 3600 

GAGGTGTCTC caOCCAATOO CCTAAAOOGA CT6CCCACTC CTTCGCCTCA GCCGCCTCCA 3660 

AGXCTCGTCC GGTTTGC06T GCCTCCTOGT CACAOGAACA ATG6GTCTGA TTCCTCCCAC 3720 

TOGGAGTACyv GCTCTCAGAC CACGCTGTCT 6CCATCA0T6 ACGACCTCAG GCAATACGAA 3780 

GCACAGCAGG 6T6COG6AG6 CCCTCCCCAC CAAOTGATTG TGCAAGCXac AGAAAACCCT 3840 

GTCTTTGCCC QOTCCACTGT GGTCCATCCG OACTCCAOAC ATCACCCTCC CTTCACCCCT 3900 

CGCCAACaGC CCCACCTGGA CTCTGCCTCC TTGTCCCCTG GACGGCAAOG CCaCCAGCCT 3960 

CGAAGOGATC CCCCTACAGA AGOCTTGOOO CCACCCCCCT ACAGACCOCG CACAGACCCT 4020 

TTTGAAATTT CTACTOAAGG GCAWCTOGC CCTAGCaATA GGGACCGCTC ACCCCCCC6T 4080 

6GGGCCCGTT CTCACAACCC TCOOAACCCA ACGTCCACCG CCATGGGCAG CTCTGTGCCC 4140 

AGCTACTCCC AOCCCyiTCAC CACTGTGACG GCTTCTOCTT COGTGACTCT TGCTGrGCAT 4200 

CCCCC6CCTO GAOCNOGOG CAACCCC06A GGGGGOCCCT GTCCAOGCTA X6AGAOCTAC 4260 

CCT6AGACTG ATCACOGGGT AITT«AGCAT CCTCATOTOC CTTTTCATGT CAGOTGTOAG 4320 

AGCAGGGACT CAAAGGIGGA CGTCaTACAG CTACAGOACG TGCAATGTCA G6AGAGGCCC 4380 

TGGGGOAOOk OCTCCAACTG AGGCTAATTA AAATCTCAAG CAAAGAGGCC AAAGATTGCA 4440 

AAOCCCCOCC CCCACCTCTT TCCAOAACTO CIT6AAGAGA ACTGCTTGGA ATTATCG6AA 4500 

OGCAGTTCAT TGTTACTCTA ACWOATTGTA TTATTKKGTO AAATATTTCT ATAAATATTT 4560 

AARA6CTGTA CACATCTAAT ATACATOGAA ATCCTOTACA GTCTATTTCC T06GGCCTCT 4620 
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CCACTCCTGC CCCAOMTGG CGAOACCACA GOGCCCCTTT CCCCTOTOTA CATTGGTCTC 
TGTCCCACAA CCAAGCTTAA CTTAGTTTTA AAAAAAATCT CCCAGCATAT CTCCCTGCTC 
CTTAAATATT GTATAATTTA CTTGTATAAT TCTATGCAAA TATTGCTTAT GTAATAGGAT 
TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA TATCACAACC CTGTGGTAGG 
ATGAATTGTT ACTGTTAACT TTTOAACACG CTATCCGTGG TAATTGTTTA ACGAGCACAC 
ATGAAGAAAA CAGGTTAATC CCAGTGGCTT CTCTAGGOGT AGTTGTATAT GGTTCGCATG 
GGTGGATGTG TGTGTGCATG TGACTTTCCA ATGTACTGTA TTGTGGTTTG TTGTTGTTGT 
TGCTGTTGTT GTTCATTTTG GTGTTTTTGG TTGCTTTGTA TGATCTTAGC TCTGGCCTAG 
GTGGGCTGGG AAGGTCCAGG TCTTTTTCTG TCGTCATGCT GGTGGAAAGG TGACCCCAAT 
CATCTGTCCT ATTCTCTGGG ACTATTC 
(2) INFORMATION FOR SBQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1434 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOIiOOYt linear 

(ii) MOLBCOI^ TYPE: protein 



4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5187 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:10: 

Met Ala Ser Ala Gly A«n Ala Ala Gly Ala Leu Gly Arg Gin Ala Gly 

1 5 10 15 

Gly Gly Arg Arg Arg Arg Thr Gly Gly Pro His Arg Ala Ala Pro Asp 
20 25 

Arg Asp Tyr Leu His Arg Pro Ser Tyr Cys Asp Ala Ala Phe Ala Leu 

40 



35 



Glu 



61n lie ser Lye Gly Lya Al. Thr Gly Arg Lye Ala Pro Leu Trp 



60 



50 55 
Leu Arg Al. Lye Phe Gin Arg Leu Leu Phe Lye Leu Gly eye Tyr U. 



65 70 



Gin Lye Asn Cy. Gly Lye Phe Leu Val Val Gly Leu Leu He Phe Gly 
85 90 

Ala val Gly Leu Ly. Ala Ala Asn Leu Glu Thr Aen Val Glu 



Ala Phe 

100 



105 110 



Glu Le« Trp Val Glu Val Gly Gly Arg Val ser Arg Glu Leu Aen Tyr 
lis 120 125 
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Thr Arg Oln Ly« Xlm Oly Glu Clu Ala Met Ph* Asn Pro Gin Leu Met 

130 135 140 

He Gin Thr Pro Lye Glu Glu Cly Ala Asn Val Leu Thr Thr Glu Ala 
145 150 155 160 

Leu Leu Gin His Leu Asp Ser Ala Leu Gin Ala Ser Arg Val Hie Val 
165 170 175 

Tyr Met Tyr Asn Arg Gin Trp Lys Leu Glu His Leu Cys Tyr Lya Ser 

160 185 190 

Gly Glu Leu He Thr Glu Thr Gly Tyr Met Aop Gin He He Clu Tyr 
195 200 205 

Leu Tyr Pro Cya Leu He He Thr Pro Leu Asp Cys Phe Trp Glu Gly 
210 215 220 

Ala Lys Leu Gin Ser Gly Thr Ala Tyr Leu Leu Gly Lye Pro Pro Leu 
225 230 235 240 

Arg Trp Thr Asn Phe Asp Pro Leu Glu Phe Leu Glu Clu Leu Lys Lys 
245 250 255 

He Asn Tyr Gin Val Asp Ser Trp Glu Glu Met Leu Asn Lys Ala Clu 
260 265 270 

Val Oly Hie Gly Tyr Met Asp Arg Pro Cya Leu Asn Pro Ala Asp Pro 
275 280 285 

Asp Cys Pro Ala Thr Ala Pro Asn Lys Asn Ser Thr Lys Pro Leu Asp 
290 295 300 

Val Ala Leu Val Leu Asn Gly Cly Cys Gin Gly Leu Ser Arg Lys Tyr 
305 310 315 320 

Met His Trp Oln Glu Clu Leu He Val Gly Gly Thr Val Lys Asn Ala 
325 330 335 

Thr Gly Lys Leu Val Ser Ala His Ala Leu Gin Thr Met Phe Gin Leu 
340 345 350 

Met Thr Pro Lys Oln Met Tyr Glu His Phe Arg Gly Tyr Asp Tyr Val 
355 360 365 

Ser His He Asn Trp Asn Glu Asp Arg Ala Ala Ala He Leu Glu Ala 
370 375 380 

Trp Gin Arg Thr Tyr Val Glu Val Val His Gin Ser Val Ala Pro Asn 
385 390 395 4OO 

Ser Thr Gin Lys Val Leu Pro Phe Thr Thr Thr Thr Leu Asp Asp He 
405 410 415 



Leu Lys Ser Phe Ser Asp Val Ser Val He Arg Val Ala Ser Cly Tyr 
420 425 430 

Leu Leu Met Leu Ala Tyr Ala Cys Leu Thr Met Leu Arg Trp Asp Cys 
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435 



440 



445 



Ser Lyo Ser Gin Gly Ala Val Gly Leu Ala Cly Val Leu Leu Val Ala 
450 455 460 

Leu Ser Val Ala Ala Gly Leu Cly Leu Cys Ser Leu lie Gly lie Ser 
465 470 475 480 

Fhe Asn Ala Ala Thr Thr Gin Val Leu Pro Phe Leu Ala Leu Cly Val 
485 490 495 

Gly val A8P Aap Val Phe Leu Leu Ala Hie Ala Phe ser Glu Thr Cly 
500 505 510 

Gin Aen Lye Arg He Pro Phe Glu Asp Arg Thr Gly Glu Cye Leu Lyc 
515 520 525 

Arg Thr Gly Ala Ser Val Ala Leu Thr Ser He Ser Asn Val Thr Ala 
530 535 540 

Phe Phe Met Ala Ala Leu He Pro He Pro Ala Leu Arg Ala Phe Ser 
545 550 555 560 

Leu Gin Ala Ala Val Val Val Val Phe Aen Phe Ala Met Val Leu Leu 
565 570 575 

He Phe Pro Ala He Leu Ser Met Asp Leu Tyr Arg Arg Glu Aep Arg 
580 585 590 

Arg Leu Asp He Phe Cye Cys Phe Thr Ser Pro Cys Val Ser Arg Val 
595 600 805 

He Gin Val Glu Pro Gin Ala Tyr Thr Glu Pro His Ser Asn Thr Arg 
610 615 "0 

Tvr Ser Pro Pro Pro Pro Tyr Thr Ser His Ser Phe Ala His Glu Thr 

' ^-sif 640 



625 



630 



635 



Hie He Thr Met Gin Ser Thr Val Gin Leu Arg Thr Glu Tyr Asp Pro 
645 650 555 

His Thr Hie Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Glu He Ser 
660 565 670 

Val Gin Pro Val Thr Val Thr Gin Asp Asn Leu Ser Cys Gin Ser Pro 
675 680 685 

Glu Ser Thr Ser Ser Thr Arg Aep Leu Leu Ser Gin Phe Ser Asp Ser 
690 695 700 

Ser Leu His Cys Leu Glu Pro Pro Cys Thr Lys Trp Thr Leu Ser Ser 
705 710 715 720 

Phe Ala Glu Lys His Tyr Ala Pro Phe Leu Leu Lys Pro Lys Ala Lys 
725 730 735 

Val Val Val He Leu Leu Phe Leu Gly Leu Leu Gly Val Ser Leu Tyr 
740 745 750 
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Gly Thr Thr Arg V*l Arg Aap Cly teu ^.p x,e« Thr A.p He V.l Pro 



765 

Arg Clu Thr Arg Olu Tyr A.p Phe n« aU A1. g1„ Phe Lys Tyr Phe 



780 



SBt Phe Tyr hmn Met Tyr llm Val Thr cin Lvb * « 

785 7QA * *nx uAn iye Ala Asp Tyr Pro Asn 

795 



800 

II. Gin Hi. I*u L.U Tyr Aop L.u Hi. ly. Sar Pho Ser Aa„ V.l Lys 

BUS 320 



815 



Tyr val Met Leu Gl« Gl« Ae„ Ly. cin Le« Pro cm Met Trp Hi. 

325 830 
Tyr Ph. Arg A.p Trp L.u Gin Gly L.« Gin Asp Ala Ph. A.p s.r A.p 

Trp Glu Thr Gly Arg II. Pro Asn A.n Tyr ly. A.„ Gly s.r A.p 

III ser Arg A.p 

^■^5 880 

I-ys Pro II. A.p 11. ser Gin L«. Thr Lye Gin Arg r,.u V.l A.p Al. 

890 895 

A.p Gly II, II. A.n Pro S.r Ala Ph. Tyr II. Tyr Leu Thr Al. Trp 

905 

V.1 ser A.n A.p Pro V.l Al. Tyr Al. Al. s.r Gin Al. A.n II. Arg 
Pro Hi. Arg Pro Glu Trp V.l Hi. A.p Ly. Al. Asp Tyr Met Pro Glu 

Thr Arg Leu Arg II. Pro Al. Al. Glu Pro II. Glu Tyr Al. Gin Ph. 

955 950 

pro Ph. Tyr Leu A.n Gly Leu Arg A.p Thr Ser A.p Phe v.l Glu Al. 

565 

He Clu Ly. V.l Arg v.l n. cy. A.n A.n Tyr Thr S.r L.u Gly Leu 

985 990 



8.r S.r ^r Pro A.„ oly ,y, p,o Ph. l.u Ph. Trp Glu Gin Tyr He 

iOOO 2005 

ser Leu Arg Hi. Trp Leu i.u Leu Ser II. ser V.l v.l Leu Al. Cy. 

WIS 1020 

Thr^Phe L.U V.l Cy. Al.^V.1 Phe Leu Leu A.n Pro Trp Thr Al. Gly 

^^^5 I04C 
II. II. v.1 Met V.1 i^u Al. Leu Met Thr Val Glu Leu Phe Gly Met 

aOSO 2055 
Met Gly Leu He Cly II. Ly. L.u S.r Al. V.l Pro V.l v.l n. Leu 
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1060 



1065 1070 



II. Ala Ser Val Cly He Oly Val Glu Phe Thr Val Hia Val Ala Leu 
1075 1080 

Ala Phe Leu Thr Ala He Oly Aap Lys Asn His Arg Ala Met Leu Ala 
1090 1095 1100 

Leu Glu Hla Met Phe Ala Pro Val Leu Aap Gly Ala Val Ser Thr Leu 
1105 11" 

Leu Oly val Leu Met Leu Ala Oly Ser Glu Phe Asp Phe He Val Arg 
1125 1130 

Tyr Phe Phe Ala Val Leu Ala He Leu Thr val Leu Gly Val Leu Aen 
1140 1145 ^^'^ 

Olv Leu val Leu Leu Pro Val Leu Leu Ser Phe Phe Oly Pro cys Pro 
1155 1160 11" 

Glu val ser Pro Ala Asn Oly Leu Asn Arg Leu Pro Thr Pro Ser Pro 
1170 1175 "8° 

Glu pro pro pro Ser val Val Arg Phe Ala Val Pro Pro Gly Hia Thr 
1185 1190 1200 

Asn A.n Gly Ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr 
1205 1210 1215 

Val Ser Gly He Ser Glu Glu Leu Arg Gin Tyr Glu Ala Gin Gin Gly 
1220 1225 1230 

Ala Gly Gly Pro Ala His Gin Val He Val Glu Ala Thr Glu Asn Pro 
1235 1240 1245 

val Phe Ala Arg Ser Thr Val Val His Pro Asp Ser Arg His Gin Pro 
1250 1255 1260 

pro Leu Thr Pro Arg Gin Gin Pro His Leu Asp Ser Gly Ser Leu Ser 
1265 1270 1275 "80 

pro Gly Arg Gin Gly Gin Gin Pro Arg Arg Asp Pro Pro Arg Glu Gly 
1285 1290 1295 

Leu Arg Pro Pro Pro Tyr Arg Pro Arg Arg Asp Ala Phe 6lu He Ser 
1300 1305 1310 

Thr Glu Gly His Ser Gly Pro Ser Asn Arg Asp Arg Ser Gly Pro Arg 
lais 1320 1325 

Gly Ala Arg Ser His Asn Pro Arg Asn Pro Thr Ser Thr Ala Met Gly 
1330 1335 1340 

ser ser Val Pro Ser Tyr Cys Gin Pro He Thr Thr Val Thr Ala Ser 
1345 1350 1355 "60 

Ala ser val Thr Val Ala Val His Pro Pro Pro Gly Pro Gly Arg Asn 
1365 1370 1375 
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1420 



Pro Arg Gly Oly Pro Cy Pro Gly Tyr Glu Ser Tyr Pro Glu Thr Asp 
"80 1385 1330 

His Gly yal Phe Glu Aep Pro Hie Val Pro Phe Hie Val Arg Cye clu 

1400 1405 

mo''"' ^^"^ Ills'""' ^'"-^^P ^« 

Clu Glu Arg Pro Trp Gly Sar Sqt Ser Asn 
1425 1430 

(2) INFORMATION FOR SBQ ID NO J 11 j 

<i) SEQUENCE CHARACTERISTICS t 

(A) LENGTHi 11 amino acida 

(B) TYPE I amino acid 

(C) STRANDEDNESSi aingle 

(D) TOPOLOGYj linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
lie lie Thr Pro Leu Asp Cys Phe Trp Glu Gly 

(2) INFORMATION FOR SBQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNSSS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Leu He Val Gly Gly 
1 5 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) 8TRANDBDNBS5: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NOl 
Pro Phe Ph© Trp Olu Gin Tyr 



2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 bass pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESSt single 

(D) TOPOLOGY: linear 

rii) MOLECULE TYPE: Other nucleic acid^ 
(A) DESCRIPTION: /deac - -primer 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
GGACGAATTC AARGTNCAYC ARYTNTGG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /dose « -primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GGACGAATTC CYTCCCARAA RCANTC 
(2) INFORMATION FOR SEQ 10 NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid^ 
(A) DESCRIPTION: /desc - "primer 



xi) SEQUENCE DESCRIPTION: SEQ ID NOl 
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GGA06AATTC YTNGANTGYT TTTGOOA 
(2) INPOHKATION FOR SBQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 baso pairs 

(B) TYPE: nuclsic acid 

(C) STRANDEDNESS: singlo 

(D) TOPOLOGY: linaar 

(ii) KOLSCULB TYPE: othmv nuclaic acid 
(A) DESCRIPTION: /dasc • "primor** 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CATACCAGCC AACCTTGTCN GGCCARTGCA T 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5288 base pairs 

(B) TYPE: nucloic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

GAATTCCCGG GACCGCAAGG AGTGCOGOOG AAGCGCCCCA ACCACAGGCT CGCTCOGOGC 60 

GCCCGCTCTC GCTCTTCOGC GAACTGOATG TGGGCAGCCC CGGCCGCAGA GACCTCGGGA 120 

CCCCOGOGCA ATCTGGCAAT GGAAGGOGCA GGCTCTGACT CCCCGGCAGC GGCOGCGGCC 180 

GCACCCGCAG CAGCGCCCGC CGTGTOACCA GCAGCAGCGG CTCGTCTGTC AACCGCAGCC 240 

OGAGCCCGAC CACCCTOCGG CCAGCAGCGT CCTCGCAAGC CGAGCGCCCA GGCGCGCCAG 300 

GAGCCCOCAG CAGOOGCAGC AGCGCCCCGG GCOGCCCGGG AAOCCTCCGT CCCCGCCGCG 360 

GCGGCGGOOG CGGCGGCGGC AACATGGCCT COGCTGGTAA CGCCCCCCAG CCCCAGGACC 420 

OCGGCGGCGG CGGCAGOGGC TGTATCGGTG CCCCGGGAOG CCCGGCTGGA GGCGGGAOGC 480 

GCAGACGGAC GGGGGGOCTG CCC06TGCTG CCGCGCCGCA CCGGGACTAT CTCCACCGGC 540 

CCAGCTACTG CGACCCCGCC TTCOCTCTGG AGCAGATTTC GAAGGGGAAG GCTACTGGCC 600 

GGAAAGCGCC ACTGTGGCTO AOAGCGAAGT TTCAGAGACT CTTATTTAAA CTGCGTTGTT 660 

ACATTCAAAA AAACTGCGGC AACTTCTTGG TTGTGGGCCT CCTCATATTT GGCGCCTTOG 720 
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C6OT0OGATT AAAAOCJVOOO AACCTCOIkCA CCAACGTGGA GGAGCTGTGG GTOGAAOTTO 780 

GAGOACXSAGT AA6TC6T0AA TTAAATTATA CTCGCCAGAA GATTGGAGAA CA06CTATGT 840 

TTAATCCTCA ACTCATGATA CAGACCCCTA AAGAAGAAGG TGCTAATGTC CTGACCACAG 900 

AAGCGCTCCT ACAACACCTG GACTCGOCAC TCCAGGCCAG CCGTGTCCAT GTATACATGT 960 

ACAACACOCA OTOGAAATTO GAACATTTGT 6TTACAAATC AOGAGAOCTT ATCaVCAGAAA 1020 

CAGCTTACAT OOATCaGATA ATAOAATATC TTTACCCTTG TTTOATTATT ACACCTTTGG 1080 

ACTCCTTCTO GOAAOGGG06 AAATTACAOT CTGGCACAGC ATACCTCCTA GGTAAACCTC 1140 

CTTTGCGGTG GACAAACTTC GACCCTTTGO AATTCCTGGA AGAGTTAAAG AAAATAAACT 1200 

ATCaUkOTGGA CAGCTOGGAG GAAATOCT6A ATAAGGCTGA GGTTGGTCAT GGTTACaTOG 1260 

ACC6CCCCTC CCTCAATCCG GCCGATCCAO ACTGCCCCGC CACAGCCCCC AACaUVAAATT 1320 

CAACCAAACC TCTTGATATG GCCCTTGTTT TGAATGGTGG ATGTCATCGC TTATCCAOAA 1380 

AGTATATGCA CTCGCA6GAG GAGTT6ATTO TGGGTGGCAC AGTCAAGAAC AGCACTGGAA 1440 

AACTCGTCAG CGCCCATGCC CTGCAGACCA TGrTCCAGTT AATGACTCCC AAGCAAATGT 1500 

ACGAGCACTT CAAGGGCTAC OAGTATGTCT CACACATCAA CTGGAACGAG GACAAAGC6G 1560 

CAGCCATCCT CGACCCCT6G CACAGCACAT ATGTGOAGGT GGTTCATCAG AGTGTCGCAC 1620 

AGAACTCCAC TCAAAAGCTG CTTTCCTTCA CCACCACGAC CCTOGACGAC ATCCTOAAAT 1680 

CCTTCTCTGA CGTC»GTGTC ATCCCCGTGG CCAGCOCCTA CTTACTCAT6 CTCGCCTATG 1740 

CCTCTCTAAC CATGCTGCGC TGOGACTGCT CCAAGTCCXy^ GGGTGCOGTG GGGCTGGCTC 1800 

GCGTCCTGCT GGTTGCACrrG TCAOTOGCTG CA6GACTGGG CCTGTGCTCA TTGATOGGAA 1860 

TTTCCTTTAA CGCTGCAACA ACTCAOGTTT TOCCATTTCT COCTCTTGGT GTTGGTGTCC 1920 

ATGATGTTTT TCTTCTGGCC CAOGCCTTCA GTGAAACaGG ACAGAATAAA AGAATCCCTT 1980 

TTGAGGACAG GACCOGGGAO TGCCTOAAGC GCACAGGACC CAGCGTGCCC CTCAtXJTCCA 2040 

TCAGCAATCT CACAOCCTTC rPCATGGCCG COTTAATCCC AATTCCCOCT CTOCGGGCGT 2100 

TCTCCCTCCA GGCAGCG6TA GTA0TCGT6T TCAATTTTGC CATGGTTCTG CTCATTTTTC 2160 

CTGCAATTCT CAGCATGGAT TTATATOGAC 6CGAGGACAG GAGACT6GAT ATTTTCTGCT 2220 

GTTTTACAA6 CCCCTGCGTC A6CAGAGTGA TTCAOGTTGA ACCTCAGGCC TACACC6ACA 2280 

CACAC6ACAA TACCCGCTAC AGCCCCCCAC CTCCCTACAO CAGCCACAGC TTT6CCCAT6 2340 

AAACGCACAT TACC»TGCa6 TCCACTOTCC AGCTCCGCAC GOAOTACGAC CCCCACACGC 2400 

ACGTGTACTA CACCACCGCT GAGCCeCGCT CCGAGATCTC TGTGCAGCCC GTCAOOGTGA 2460 



54 



wo 96/11260 



PCTAJSM/13233 



CACAOOACAC OCICAOCTOC CACAOCCCAO AOAOCACCAO CTCCACAAOO OACCIGCTCT 2520 

CCCAOTTCTC OOACTCCAOC CTCCACTCCC TOGAGOCCCC CTGTAC6AAG TGOACACTCT 2580 

CATCTTTTCC T0A6AA0GAC TATQCTCCTT TCCTCTTOAA ACCAAAACCC AACGTAGTOC 2640 

TGATCTTCCI TTTTCTCCOC TTGOTOOCCQ TOAOCCTTTA TGGCACCACC CCAGT6A6AG 2700 

ACGGGCTGCA CCTTACOGAC ATTOTACCTC GOOAAACCAO AGAATAT6AC TTTATTGCTC 2760 

CACAATTCAA ATACTTTTCT TTCTACAACA TOTATATAGT CACCCAGAAA OCAGACTACC 2820 

CGAATATCCA GCaCTTACTT TAOOACCTAC ACAOOAOTTT CAGIAACOTG AAGTATOTCA 2880 

T6TTGGAA6A AAACAAACAO CTTCCCAAAA TGTOCCTOCA CIACTTCAGA GACTGGCTTC 2940 

ACOGACTTCA OOATOCATTT OACAOTOACT GGGAAACCGG OAAAATCATC CCAAACAATT 3000 

ACa^AGAATGO ATCAGACGAT GGAGTCCTTC CCTACAAACT CCTG6TCCAA ACCGGCAGCC 3060 

OCOATAAOCC CATC6ACATC AGCCA6TT0A CTAAACAOCG TCTGCTGGAT OCA6ATOOCA 3120 

TCATTAATCC CAGC6CTTTC lACATCTACC TOAOOGCTTG GGTCAOCAAC OACCCCCTCG 3180 

C0TATGCT6C CTCCCAGGCC AACAXCCGGC CACACOGACC AGAATGGGTC CACGACAAAC 3240 

COGACTACAT GCCTGAAACA AGCCT0A6AA TCCOGGCAGC AGAGCOCATC GACTAIGCCC 3300 

AOrrcCCTTT CTACCTCAAC GGCTTGCGGG ACACCTCAGA CTrTOTGGAG GCAATTGAAA 3360 

AAGTAAGCAC CATCTOCAGC AACIAXA06A GCCTGGGGCT OTCCAGTTAC CCCAACGCCT 3420 

ACCCCTTCCT CTTCTOGGAG CAGTACATCO OCCTCOOCCA CTCGCTOCTC CTGTTCATCA 3480 

CCCTGCTCTT GCCCTOCACA TTCCTCCTCT GOGCTGTCTT CCTTCIGAAC CCCTGGAOGG 3540 

CCGGGATCAT TCTQATG6TC CTGGOOCTOA TGA0GGTC6A CCTGTTOGGC ATGATCC6CC 3600 

TCATOGCAAT CAAGCTCA6T 6OC»T0CC00 TOOTCATCCT GATCGCTTCT GTTCGCATAG 3660 

GACTGCAGTT CACCGTTCAC GTTOCTTXtJG CCTTICTOAC GGCCATOGCC GACOAGAACC 3720 

GCAGGCCTGT OCTTOCCCTG OAOCACATCT TTGCACCOGT CCTGGATCGC OCCCTGTCCA 3780 

CTCTGCTGGC AOTGCTCATG CTCGOGGCAT CTOAOTTCGA CTTCaTlGTC AOGTATTTCT 3840 

TTGCTCIGCT GCCGATCCTC ACCATCCICO 60GTTCTCAA TOGGCTOGTT TIGCITCCCC 3900 

TCCTTTTGTC TTTCTTTGGA CCATATCCTG AGGTGTCTCC AGCCAAOOCC TTGAACCCCC 3960 

TCCCCACACC CTCOCCIGAG CCACCCCCCA GCGTGGTCCG CTTCGCCATC CCGCCOGGCC 4020 

ACACGCACAG CGGCTCTGAT TOCTCCGACT COOAGTATAG TTCCCACAC6 ACAGTCTCAG 4080 

60CTCAGCGA GGAGCTTOGG CACTACOAOO CCCAGCAGGG OGCGGGACGC CCTGCCCACC 4140 

AAGTGATCGT GOAAOCCACA 6AAAACCC00 TCTTCOCOCA CTCCACWGTG CTCCaTOCCM 4200 
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MVTCCAOOCA TCACCCKCCC TOGAI.CCCOA GKCAOCACCC CCACCTOGAC TCAfiCOTCCC 
TCCCTCCCOO ACGOCAXOOC CACCACCCCC OCAGGGACCC CCCCAGAGAA GGCTTGTOGC 
CACCCCtCTA CAOACCGCGC A6AGACGCTT TTGAAATTTC TACTGAAGGG CATTCTGGCC 
CTAGCAATAG GOCCCXSCTGC GCCCCTCGCG GGGCCCGTTC TCACAACCCT CGGAACCCAG 
CGTCCACTGC CATGOGCAGC TCOGTGCOCG GCTACTGCCA GCCCATCACC ACTGTGACGG 
CTTCTCCCTC COTGACTGTC GCCGTGCACC OGCCGCCTGT CCCTGGGCCT OGGCGGAACC 
CCCGAGGGGG ACTCTGCCCA GGCTACCCTG AGACTGACCA CGGCCTGTTT GAOGACCCCC 
ACGTGCCTTT CCACGTCCGG TGTGAGAGGA GGGATTCGAA GGTGGAMTC ATTGAGCTGC 
AGGACGTCGA ATGCGAOGAG AGGCCCCGGG GAAGOUJCTC CAACTGAGGG TGATTA*AAT 
CTGAAGCAAA GAGGCCAAAG ATTGGAAACC CCCCACCCCC ACCTCTTTCC AGAACTGCTT 
GAAGAGAACT GGTTGGA6TT ATGGAAAAGA TGCCCTGTCC CAGGACAGCA GTTCATTGTT 
ACTGTAACCG ATTCTATTAT TTTGTTAAAT ATTTCTATAA ATATTTAAGA GATCTACACA 
TOTGTAATAT A0GAAC6AAG GATGTAAAGT GGTATGATCT GGGGCTTCTC CACTCCTGCC 
CCAGAGTGTG GAGGCCACAO TOGGGCCTCT CCOTATTTGT GCATTGGGCT COGTGCCACA 
ACCAAGCTTC ATTAGTCTTA AATTTCAGCA TATGTXGCTG GTGCTTAAAT ATTGTATAAT 
TTACTTGTAT AATTCTATGC AAATATTGCT TATGTAATAG 6ATTATTTTG TAAAGGTTTC 
TCTTTAAAAT ATTTTAAATT TOCATATCAC AACCCTGTGG TAGTATGAAA TGTTACTGTT 
AACTTTCAAA CAOGCTATOC GTGAtAATTT TTTTGTTTAA TGAGCAOATA TGAAGAAACC 
CCGGAATT 

(2) INFORMATION POR SEQ ID NOjWi 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTHS 1447 amino acldo 

(B) TYPE I amino acid 

(C) STRAHDBDNBSSi flingi* 

(D) TOPOLOGY t linear 

(ii) MOLBCOLB TYPE I protain 



4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5288 



(Xi) SEQUENCE DESCRIPTIONt SEQ ID N0:19t 

Mat Ala Sar Ala Gly Aan Ala Ala Gl« Pro Gin Asp Arg Gly Gly Gly 
1 5 

Gly sar Gly Cye lie Gly Ala Pro Oly Ar, Pro Al. Gly Gly Gly Arg 
20 25 
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Arg Arg Arg Thr Oly Gly hmu Arg Arg Ala Ala Ala Pro Asp Arg Aap 
35 40 45 

Tyr Leu Hia Arg Pro Ser Tyr Cys Asp Ala Ala Phe Ala Leu Clu Gin 
50 55 60 

He Ser Lya Gly Lye Ala Thr Gly Arg Lye Ala Pro Leu Trp Leu Ara 
65 70 75 80 

Ala Lya Phe Oln Arg Leu Leu Phe Lye Leu Gly Cys Tyr He Gin Lye 
85 90 95 

Aen eye Gly Lya Phe Leu Val Val Gly Leu Leu He Phe Oly Ala Phe 
100 105 110 

Ala Val Gly Leu Lya Ala Ala Aan Leu Glu Thr Asn Val Glu Glu Leu 
"5 120 125 

Trp Val Glu Val Gly Oly Arg Val Ser Arg Glu Leu Asn Tyr Thr Arg 
130 135 140 

Gin Lya He Gly Glu Glu Ala Net Phe Asn Pro Gin Leu Met He Gin 

150 155 160 

Thr Pro Lya Olu Olu Cly Ala Asn Val Leu Thr Thr Glu Ala Leu Leu 
165 170 175 

Gin His Leu Aep Ser Ala Leu Gin Ala Ser Arg Val His Val Tyr Met 
180 185 190 

Tyr Aan Arg Gin Trp Lye Leu Glu Hia Leu Cys Tyr Lya Ser Gly Glu 
195 200 205 

Leu He Thr Olu Thr Gly Tyr Met Aap Gin He He Glu Tyr Leu Tyr 
210 215 220 

Pro Cys Leu He He Thr Pro Leu Asp Cys Phe Trp Glu oly Ala Lvs 
225 230 235 240 

Leu Oln Ser Gly Thr Ala Tyr Leu Leu Oly Lys Pro Pro Leu Arg Trp 
245 250 255 

Thr Aan Phe Aap Pro Leu Glu Phe Leu Glu Glu Leu Lys Lys He Aan 
260 265 270 

Tyr Gin Val Aap Ser Trp Olu Olu Met Leu Aen Lya Ala Olu Val Gly 
275 280 285 

Hie Oly Tyr Met Asp Arg Pro Cys Leu Aan Pro Ala Asp Pro Aap Cya 
290 295 300 



Pro Ala Thr Ala Pro Asn Lya Aan Ser Thr Lye Pro Leu Aap Met Ala 
305 310 315 320 

Leu Val Leu Aan Gly Oly Cye Hia Cly Leu Ser Arg Lya Tyr Met Hie 
325 330 335 

Trp Gin Glu Clu Leu He Val cly Cly Thr Val Lya Asn Ser Thr Oly 
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340 

I^ya te« val S.r ,.1. Hi- Al. Leu Gin Thr Met Ph. Gin I.au Met Thr 
355 

P.O Ly. Gin Met Tyr Glu Hi. Phe Lys Gly Tyr Glu Tyr Val Ser His 

370 

IX. Asn Xrp A.n Glu ..p Lys Ala Ala Al. II. L.u Glu Ala Trp Gin 
385 

Thr tyr Val Glu Val Val Hi. Gin Ser Val Ala Gin Asn Ser Thr 
405 

Gin Ly. val Leu ser Ph. Thr Thr Thr Thr Leu Asp A.p lie Leu Lye 
420 

ser Phe Ser A.p Val Ser Val lie Arg Val Ala Ser Gly Tyr Leu L.u 

435 **° 
Met Leu Ala Tyr Ala Cya Leu Thr Met Leu Arg Trp A.p Cy. Ser Ly. 
450 

ser Gin Gly Ala Val Gly Leu Ala Gly Val Leu Leu Val Ala Leu S.r 
465 

val Ala Ala Gly Leu Gly Leu Cy. Ser Leu He Gly He Ser Phe A.n 
485 

,.la Ala Thr Thr Gin Val Leu Pro Phe Leu Ala Leu Gly Val Gly Val 

500 505 

ABp ASP val Phe Leu Leu Ala Hi. Ala Phe Ser Glu Thr Gly Gin A.n 
515 

I.y. lie pro Phe Glu A.p Arg Thr Gly Glu cy. Leu Lye Arg Thr 



345 



350 



530 



535 



CXy Ala ser Val Ala Leu Thr Ser He Ser A.n Val Thr Ala Phe Ph. 



545 



550 



Met Ala Ala Leu He Pro He Pro Ala Leu Arg Ala Phe ser Leu Gin 
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570 



Ma Ala val Val Val Val Phe A.n Phe Ala Met Val Leu Leu He Phe 



580 



585 



pro Ala lie Leu Ser Met A.p Leu Tyr Arg Arg Glu A.p Arg Arg L.« 



595 



A.p II. Phe eye Cy. Phe Thr Ser Pro cy. Val Ser Arg Val He Gin 
610 

val Glu pro Gin Ala Tyr Thr A.p Thr Hi. A.p A.n Thr Arg Tyr ser 



625 



630 



P.O pro pro pro Tyr Ser Ser Hi. Ser Phe Ala Hi. Glu Thr Gin He 



A^n 655 

645 
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Thr Mftt Gin 5«r Thr Val Gin Leu Arg Thr Glu Tyr Asp Pro His Thr 
660 665 670 

His Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Glu He Ser Val Gin 
675 680 685 

Pro Val Thr Val Thr Gin Asp Thr Leu Ser Cys Gin Ser Pro Glu Ser 
690 695 700 

Thr Ser Ser Thr Arg Asp Leu Leu Ser Gin Phe Ser Asp Ser Ser Leu 
705 710 715 720 

Hie Cye Leu Glu Pro Pro Cys Thr Lys Trp Thr Leu Ser Ser Phe Ala 

725 730 735 

Glu Lys His Tyr Ala Pro Phe Leu Leu Lys Pro Lys Ala Lys Val Val 
740 745 750 

Val He Phe Leu Phe Leu Gly Leu Leu Gly Val Ser Leu Tyr Gly Thr 
755 760 765 

Thr Arg Val Arg Asp Gly Leu Asp Leu Thr Asp He Val Pro Arg Glu 
770 775 780 

Thr Arg Glu Tyr Asp Phe He Ala Ala Gin Phe Lys Tyr Phe Ser Phe 
785 790 795 800 

Tyr Asn Met Tyr He Val Thr Gin Lys Ala Asp Tyr Pro Asn He Gin 
805 810 815 

His Leu Leu Tyr Asp Leu His Arg Ser Phe Ser Asn Val Lys Tyr Val 
820 825 830 

Met Leu Glu Glu Asn Lys Gin Leu Pro Lye Met Trp Leu His Tyr Phe 
835 B40 845 

Arg Asp Trp Leu Gin Gly Leu Gin Asp Ala Phe Asp Ser Asp Trp Glu 
850 855 860 

Thr Gly Lys He Met Pro Asn Asn Tyr Lys Asn Gly Ser Asp Asp Gly 
S65 870 875 880 

Val Leu Ala Tyr Lys Leu Leu Val Gin Thr Gly Ser Arg Asp Lys Pro 
885 890 895 

He Asp He Ser Gin Leu Thr Lys Gin Arg Leu Val Asp Ala Asp Gly 
900 905 910 

He He Asn Pro Ser Ala Phe Tyr He Tyr Leu Thr Ala Trp Val Ser 

915 920 925 

Asn Asp Pro Val Ala Tyr Ala Ala Ser Gin Ala Asn He Arg Pro His 
930 935 940 



Arg Pro Glu Trp Val His Asp Lys Ala Asp Tyr Met Pro Glu Thr Arg 
M5 950 955 960 

Leu Arg He Pro Ala Ala Glu Pro He Glu Tyr Ala Oln Phe Pro Phe 



59 



wo 96/11260 



PCT/US95/13233 



965 910 975 

Tyr Leu Asn Oly Leu Arg Asp Thr Ser Asp Phe Val Glu Ala He Glu 
980 985 990 

Lye Val Arg Thr He Cye Ser Aen Tyr Thr Ser Leu Gly Leu Ser Ser 
995 1000 1005 

Tyr Pro Aen Gly Tyr Pro Phe Leu Phe Trp Glu Gin Tyr He Gly Leu 
1010 1015 1020 

Aro Hie Trp Leu Leu Leu Phe He Ser Val Val Leu Ala Cye Thr Phe 
1025 1030 1035 1040 

Leu val Cya Ala Val Phe Leu Leu Asn Pro Trp Thr Ala Gly He He 
1045 1050 1055 

Val Met Val Leu Ala Leu Met Thr Val Glu Leu Phe Gly Met Met Gly 
1060 1065 1070 

Leu He Gly He Lys Leu Ser Ala Val Pro Val Val He Leu He Ala 
1075 1080 1085 

Ser val Gly He Gly Val Glu Phe Thr Val Hie Val Ala Leu Ala Phe 
1090 1095 1100 

Leu Thr Ala He Gly Asp Lye Aen Arg Arg Ala Val Leu Ala Lou Glu 
1105 1110 1115 1120 

His Met Phe Ala Pro Val Leu Asp Gly Ala Val Ser Thr Leu Leu Gly 
1125 1130 1135 

Val Leu Met Leu Ala Oly Ser Glu Phe Asp Phe He Val Arg Tyr Phe 
1140 1145 1150 

Phe Ala Val Leu Ala He Leu Thr He Leu Gly Val Leu Asn Gly Leu 
1155 1160 1165 

Val Leu Leu Pro Val Leu Leu Ser Phe Phe Gly Pro Tyr Pro Glu val 
1170 11^5 1180 

ser Pro Ala Asn Gly Leu Asn Arg Leu Pro Thr Pro Ser Pro Glu Pro 
1185 1190 1195 1200 

Pro Pro ser Val Val Arg Phe Ala Met Pro Pro Gly His Thr His Ser 
1205 1210 1215 

Gly ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr Val Ser 
1220 1225 1230 

Oly Leu ser Glu Glu Leu Arg His Tyr Glu Ala Gin Gin Gly Ala Gly 
1235 1240 1245 

Gly Pro Ala His Gin Val lis Val Glu Ala Thr Glu Aen Pro Val Phe 
1250 1255 1260 

Ala His ser Thr Val Val His Pro Glu ser Arg His His Pro Pro Ser 
1265 1270 1275 1280 
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Aen Pro Arg Gin Oln Pro His hmu Asp S«r Gly Ser tmxx Pro Pro Gly 
1265 1290 1295 

Arg Gin Qly Gin Gin Pro Arg Arg Asp Pro Pro Arg Glu Gly Leu Trp 
1300 1305 1310 

Pro Pro Leu Tyr Arg Pro Arg Arg Aap Ala Phe Glu lie Ser Thr Glu 
1315 1320 1325 

Gly Hie Ser Gly Pro Ser Aen Arg Ala Arg Trp Gly Pro Arg Gly Ala 
1330 1335 1340 

Arg Ser Hie Aen Pro Arg Aen Pro Ala Ser Thr Ala Met Gly Ser Ser 
1345 1350 1355 1360 

Val Pro Gly Tyr Cya Gin Pro He Thr Thr Val Thr Ala Ser Ala Ser 
1365 1370 1375 

Val Thr Val Ala Val Hie Pro Pro Pro Val Pro Gly Pro Gly Arg Asn 
1380 1385 1390 

Pro Arg Gly Gly Leu Cya Pro Gly Tyr Pro Glu Thr Aep Hie Gly Leu 
1395 1400 1405 

Phe Glu Aep Pro His Val Pro Phe His Val Arg Cys Glu Arg Arg Asp 
1410 1415 1420 

Ser Lya Val Glu Val He Glu Leu Gin Aap Val Glu Cya Glu Glu Arg 
1425 1430 1435 1440 



Pro Arg Gly Ser Ser Ser Aen 
1445 
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WUATKrTATMFnYS! 

1. A DNA sequence other than present in a chromosome encoding a patched gene 
other than the Drosophila packed gene or fragment thereof of at least about 12bp 

5 different from the sequence of the Drosophila patched gene. 

2. A DNA sequence according to Qaim 1, wherein said patched gene is a 
mammalian gene. 

10 3, A DNA sequence according to Claim 1 for human, mouse, mosquito, butterfly 
or beetle patched gme. 

4. A DNA sequence according to Claim 3, wherein said DNA sequence is a 
human sequence. 

15 

5. A DNA sequence according to Claim 4, wherein said DNA sequence is a 
mouse sequence. 

6. A DNA sequence according to Qaim 1, wherein said DNA sequence is a 
20 fragment of at least about 18bp. 

7. A DNA sequence according to Claim 1 joined to a DNA sequence comprising 
a restriction enzyme recognition sequence. 

25 8. An expression cassette comprising a transcriptional initiation region functional 
in an expression host, a DNA sequence according to Claim 1 under the 
transcripdonal regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said expression host. 

30 9. An expression cassette according to Claim 8, wherein said transcriptional 
initiation region is heterologous to said DNA sequence according to Claim 1. 
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10. An expression cassette according to Claim 8, wherein said transcriptional 
initiation region is homologous to said DNA sequence acconling to Claim 1 and 
includes the enhancer region. 

5 11. A cell comprising an repression cassette according to Claim 8 as part of an 
extrachromosoraal element or int^rated into the genome of a host cell as a result of 
introduction of said expression cassette into said host cell and the cellular progeny of 
said host cdl. 

10 12. A cell according to Claim 11, further comprising the patched protein in the 
cellular membrane of said cell. 

13. A ceU according to Claim 1 1 , wherein said patched protein is a mouse patched 
protein. 

15 

14. A ceU according to Claim 1 1 , wherein said patdied gene is a human patched 
protein. 

15. A cell according to Claim 1 1, wherein said transcriptional initiation regiwi is a 
20 Drosophila patched gene transcriptional initiation region comprising the promoter 

and enhancer joined to a heterologous gene. 

16. A cell comprising an expression cassette comprising a transcriptional initiation 
r^on functional in an expression host, said transcriptional initiation r^on 

25 consisting of a 5* non-coding region regulating tiie transcription of patdied protein 
comprising the promoter and enhancer, a marker gene, and a transcriptional 
termination r^on, as part of an extrachromosomal element or integrated into die 
gmome of a host cell as a result of introduction of said expression cassette into said 
host, and the cellular progeny thereof. 
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17. A cdl according to Claim 16. wherein said transcriptional initiation region is 
the DrosopMa region. 

18. A meUiod for following embryonic development employing the patched 
5 protein in an embryo, said method comprising: 

integrating an expression cassette comprising a transcriptional initiation region 
functional in embryonic host oeUs. said transcriptional initiation region consisting of 
a 5' non-coding region regulating the transcription of patched protein, a marker 
gene, and a transcriptional termination region, wherein said embryonic host cells are 

10 capable of developing into a fetus; 

growing said embryonic host cells, whereby proliferation and differentiation 

occur; and 

locating ceUs comprising expression of \hc patched protein by means of 
expression of said marker gene. 



15 



19. A metiwd for producing patched protein, said meU)od comprising: 
growing a ceU according to Claim 1 1 , whereby said patched protein is 

expressed; and 

isolating said patched protein free of otiier proteins. 

20 

20. A method for screening candidate compounds for binding affinity to tiie 
patched protein, said mediod comprising: 

combining said candidate protein witii a vertebrate or invertebrate ceU 
comprising said patched protdn in tiie membrane of said cell and an expression 
25 cassette comprising a transcriptional initiation region fiinctional in said ceU, a DNA 
sequence according to Claim 1 comprising tiie entire coding sequence under tiie 
transcriptional regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said cell, expressing said patched 

protein in said cell; and 
30 assaying for tiie binding of said candidate compound to said patched protein. 
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21. A method for screening candidate compounds for agonist activity with the 
patched protein, said method comprising: 

combining said candidate protein with a vertebrate or invertebrate cell 
comprising said patched protein in the membrane of said cell and an expression 

5 cassette comprising a transcriptional initiation region functional in an expression 
host, said transcriptional initiation region consisting of a 5' non-coding region 
regulating the transcription of patched protein, a marker gene, and a transcriptional 
termination region, as part of an cxtrachromosomal element or integrated into the 
genome of a host cell; and 

10 assaying for the expression of said marker gene. 

22. A monoclonal antibody binding specifically to a patched protein, other than 
the Drosophila patched protein. 



65 



wo 96/11260 



PCT/US95/13233 



0) 



(0 
Q 



(0 



3 t 
ss + 



t + 



+ 
+ 

■ + 



+ 
+ 
+ 



+ 
+ 



+ 
+ 



1/1 

• + 



X X 



+ + 
+ + 



L 9PN- 
L OON- 
LM 003- 



g 

€0 

a 



CO 
0) 



n 
a 

n 



00 



00 



00 



1-1 



00 



O 00 in tJ- <M rN 

"HV "Kh "Kvo >f> 



n 



in 



1^ 
5 



o 
u 

I 



Ed 



IIIPUIH- 



n Bdv- 



L BID- 



in 



INTERNATIONAL SEARCH REPORT 



Intcmaiional application No. 
PCT/US95/13233 



A. CLASSinCAnON OF SUBJECT MATTER 
IPC(6) :C12N 5/00, 15/00; C07H 21/00 
USCL : 536/23.1; 435/69.1,172.3 . 240.2.320.1 

According to Intenutional Patent CUisiftcation (IPC) or to b oth national cUaaification and IPT 

B. FIELDS SEARCHED 

Mtnimum docunientation searched (classiftcalion lystem followed by clastification aymboU) 
U.S. : 536/23.1; 435/69.1.172.3,240.2.320.1 

Documcnution searched other than minimum documentation to the extent that tuch documenu are included in the fields searched 



Electronic data base consulted during the intenutional search (name of data base and, where practicable » search terms used) 
Please See Extra Sheet. 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category' 



Citation of document, with indication, where appropriate, of the relevant pssages 



Relevant to claim No. 



Nature, Volume 341, issued 12 October 1989, Nakano et at., 
"A protein with several possible membrane-spanning domains 
encoded by the Drosophila segment polarity gene patched", 
pages 508-513, see the entire document. 

Cell, Volume 59, issued 17 November 1989, Hooper et al., 
"The Drosophila patched gene encodes a putative membrane 
protein required for segmental patterning", pages 751-765, 
see the entire document. 

Gene, Volume 112, issued 1992, Chavrier et a!., "The 
complexity of the Rab and Rho GTP-binding protein 
subfamilies revealed by a PCR cloning approach", pages 261- 
264, see the entire document. 



1-10 and 18 



1-10 and 18 



1-10 



"x| Further documenti are listed in the continuation of Box C. See patent fomily annex. 



'A* 
•E- 

'ir 
•r 



Sptemi cttcfcm of cited documcntt: 

ilfw'Wiif 1 11) finiin ifae teaermi wm of Ac n wfaicfa k not comidcwd 
lo keof pwtiCBhriviwBBCB 



iKcr docwwtapUbUM kfter the bieriMiiofi^ fiUnf da^ 
dile Dot ■ ooankt widi the afplkaftno but cited 10 ui^^ 
pnadple or tbeocy mdtt^'mt^ mvtntiao 



ipyblahfld oa or lAtf the 



t which my tttraw douto oo priori^ cbim<i> or which b 
eiMd ID Hiiliih Iho puMiottioa due of iBotfwr oiMiao or otfier . 
ifeohd mm <ee tpacilied) > 

ntcamg lo m onl dacloeun, tac^ ftrfiihilioo or other 
piAliehwl prior to lh» jatrnwiiniwlfilini date bm licrtfaB -a* 



orcMMtho 



t of ponieultr rekvwoe; tho chwwd invcntiao mom he 
I Id favohrv m imrvtivo ilep whm the d onmmt 
toooori 

beioi ohviouelos pmeoaltiUed ikthtart 



Date of the aetual eompletion of the iniemationat tearch 
19 JANUARY 1996 



Date of mailing of the international search report 



8lJANt996 



AuO-roed office, {^^(,0©', A f/Ui/^ 

lASEMINE C. CHAMBERS / ' 



Name and mailing address of the ISAAiS 
ConWMteio— r of Patents and Tcmdomarks 

Box per 

Waahiivcoo. D.C. 20231 
Facsimile No. f703) 305-3230 



Telephone No. ry03) 30g-0196 



Fbcm PCT/ISA/210 (second ibee()(Ju]y 1992)* 



INTERNATIONAL SEARCH REPORT 



Intentional application No. 
PCT/US95yi3233 



C (ConiinuaUon). DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



Biochemistry, Volume 31, No, 44, issued 1992, Ma et al., 
"Molecular cloning and characterization of rKlklO, a cDNA 
encoding T-kininogenase from rat submandibular gland and 
l^dney', pages 10922-10928, see the experimental procedures on 
page 10923. 

Gene, Volume 74, issued 1988, Thummel et al., "Vectors for 
Drosophila P-element-mediated transformation and tissue culture 
transfection", pages 445-456, see the entire document. 

Developmental Genetics, Volume 12, issued 1991, Perrimon et 
al., "Generating lineage-specific markers to study Drosophila 
development", pages 238-252, see the entire document. 



MO 



8-10 and 18 



8-10 and 18 



Ponn PCT/ISA/210 (continuation of second ihect)(July 1992)* 



INTERNATIONAL SEARCH REPORT 



Intcmalional appttcaiion No. 
PCT/US95/13rJ3 



Box I Obscrratioos where certain claims were found unsearchable (Continuation of item 1 of first sheet) 

This imemationai report has not been established in respect of certain cbims umkr Aiticle 17C2)(a) for llic following reasons: 

1. j~j Claims Nos.: 

because they nelatc to subject matter not required to be searched by this Authority, namely: 



□ 



Claims Nos.: 

because they relate to parts of the international application that do not comply with the prescribed requirements to such 
an extent that no meaningful international search can be carried out. specifically: 



3. CUimsNos.; 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a}. 



Box II Obscrvatioiu where unity of invention is lacklnR (Cootiouatioa of item 2 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 
Please See Extra Sheet. 



rn ^ required additional search fees were timely paid by the applicant, this international search report covers all searchable 



claims, 



^- I I As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 

n As only some of the required additional search fees were timely paid by the applicant, this international search report covers 
only those claims for which fees were paid, specifically claims Nos.: 



Px] required additional search fees were timely paid by the applicant. Consequently, this international search report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 
MO AND 18 



Remark on Protest The additional search fees %verc accompanied by the applicant's protest. 

I I No protest aocompanied the payment of additional search fees. 



Form PCT/lSA/210 (continuation of first shect(l)Kiu]y 1992)* 



INTERNATIONAL SEARCH REPORT 



international application No. 
PCT/US95/ 13233 



B. RELDS SEARCHED 

Electronic data bases consulted (Name of dau base and where practicable icrmi used): 

APS, DIALOG ^ ^ , Ts^t^ A u i 

search terms: pauhed. gene, cloning. PCR. human, mammalian, mouse, mosquito, buuerfly, beetle. DNA. drosophila. 

embryo, gal, galaclosidasc, develop, review, inventors* names 

BOX U. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as follows: 

This application contains the foUowing inventions or groups of invenUons which arc not so linked as to form a single 
inventive concept under PCT Rule 13.1. In order for all invcndons to be examined, the appropruic adduional 
examination fees must be paid. 

Group I. cUims MO and 18. drmwn to a DNA sequence encoding a patched gene, an expression casseUe, and a method 
for foUowing embryonic development by integrating the expression cassette. 

Group n. cUims 1 1-17 and 19, drawn to a ceU and a method for producing patched protein by growing the cell. 

Group III, cUim 20. drawn to a method for screening candidate compounds for binding affinity to the patched protein. 

Group IV. claim 11, drawn to a method for screening candidate compounds for agonist activity with the patched 
protein. 

Croup V, claim 22. drawn to a monoclonal antibody spociiic for a patched protein. 

The inventions listed as Groups l-V do not relate to a single invenUve concept under PCT Rule 13.1. because, under 
PCT Rule 13.2, they lack the same or corresponding special technical features for the following reasons: 

Groups I, U and V are not so linked as to form a single inventive concept because the DNA sequence of Group I. the 
ceU of Group II and the monock»nal antibody of Group V are drawn to throe different producu. 

Groups 11 and III are not so linked as to form a single inventive concept because they are drawn to materi^y different 
methods. The method of Group U involves growing a oeU while the method of Group III involves combmmg a 
candidate compound with a cell and then assaying for binding. 

Groups II and IV are not so linked as to form a single invcnUve concept because they are drawn to materiaUy different 
methods. The method of Group H involves growing a ceU whiJe the method of Group IV involves combmmg a 
candidate compound viHth a cell and then assaying for expression of a maricer gene. 



Form PCT/lSA/210 (ojrtim aheetXJuly IW2)* 



